DOCOHEHT BESDHE 

CS 000 728 

MacGinitie, Walter H. , Ed* 
Assessment Problems in Reading. 

International Reading Association, Nei^ark, Del. 
73 

107p. 

International Reading Association, 6 Tyre Avenue, 
Newark,-. Del. 19711 (Order No. 462, $3.00 uon-meinher , 
$2.00 member) 

MF-$0.65 HC-$6.58 

Classroom Environment; *Criterion Referenced Tests; 
Reading; *Reading Diagnosis; Reading Instruction; 
Reading Materials; Reading Processes; Reading Skills; 
♦Reading Tests; *Standardized Tests; *Test 
Interpretation; Test Results 



The papers in this volume deal with a range of 
assessment problems in reading. The first paper, by Karlin, 
introduces the general problem of using assessment procedures to 
guide teaching. The next six papers deal with various aspects of this 
general problem. Otto discusses the distinction between 
norm-referenced, standardized achievement tests and 
criterion-referenced measures. Johnson shows how the teacher can 
prepare his own criterion-referenced evaluation procedures to fit 
specific objectives in word attack skills. Berg's paper documents the 
difficulty in evaluating specific components of reading ability. 
MacGinitie points out that the natur^-of what is tested in reading 
changes from the lower to the higher grades. Carver critically 
analyzes the relationship between reasoning and reading. Thorndike . 
discusses some of the problems of test interpretation. The next two 
papers deal with the instructional setting and the instructional 
materials. Brittain provides a checklist of points to coBsider when 
evaluating classroom organization. A paper by Botel, Dawkins, and 
Granowsky offers a way of analyzing the structures of sentences to 
estimate their complexity. The last two papers, Work's and Jason and 
Dubnow* s, consider the relationship between the reading ability of 
the child, the material he reads, and his assessment of his reading 
ability. (liR) 



ED 082 138 

AUTHOR 
TITLE 

INSTITUTION 
PUB DATE 
NOTE 

AVAILABLE FROM 



EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



ERIC 



FILMEb FROM BEST AVAILABLE COPY 



US oepartmentofhealth, 

EDUCATION & WELFARE 
NATIOMAL INSTITUTE OF 
EDUCATION 

HiS DOCUMENT HAS BEEN REprO 
DUCED E><ACTLy AS RECEIVED FROftA 
THE PERSON OR ORGANIZATION ORlGiN 
ATING tT POtNTSOF VlE«W OR OPINIONS 
STATED DO NOT NECESSARILY REPRE 
SENTOFFiCIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 

assessment 
problems 
in reading 

Walter H. MacGinitie, Editor 
Teachers College 
Columbia University 




INTERNATIONAL READING ASSOCIATION 
Six Tyre Avenue • Newark, Delaware 19711 



INTERNATIONAL READING ASSOCIATION 



OFFICERS 
1973-1974 

President Millard H. Black, Los Angeles Unified School District, 

Los Angeles, California 

President- Etect Constance M. McCullough, California State University, 
San Francisco, California 

Past President William K. Durr, Michigan State University, 
East Lansing, Michigan 



DIRECTORS 

Term expiring Spring 1974 

William Eller, State University of New Yor k, Buffalo, New York 
William J. Iverson, Stanford University, Stanford, California 
Eunice Shaed Newton, Howard University, Washington, D. C. 

Tern) expiring Spring 1975 

Harold L. Herber, Syracuse University, Syracuse, New York 
Helen K. Smith, University of Miami, Coral Gables, Florida 
Grace S. Walby, Child Guidance Clinic of Greater Winnipeg, 
Winnipeg, Manitoba 

Ternn expiring Spring 1976 

Ira E, Aaron, University of Georgia, Athens, Georgia 

Lynette Saine Gaines, University of South Alabama, Mobile, Alabama 

Tracy F. Tyler, Jr., Robbinsdale Area Schools, Robbinsdale, Minnesota 



Executive Secretary-T -easurer Ralph C. Staiger 

Assistant Executive Secretaries Ronald W. Mitchell 

James M. Sawyer 

Business Manager Ronald A. Allen 

Director of Research Stanley F. Wanat 

Journals Editor Lloyd W, K line 

Publications Coordinator Faye R. Branca 



ERLC 



Copyright 1973 by the 

International Reading Association, inc. 

Library of Congress Catalog Card Number 73-84793 



■PERMISSION TO REPRODUCE THIS COPY- 
RH3HTE0 MATERIAL HAS B^GN GRANTED BY 



International 



Reading Association 

TO ERIC AND ORGANIZATIONS OPERATING 
UNDER AGREEMENTS WITH THE NATIONAL IN- 
STITUTE OF EDUCATION FURTHER REPRO- 
DUCT»ON OUTSIDE THE ERIC SYSTEM RE- 
QUIRES PERMISSION OF THE COPYRIGHT 
OWNER' 



CONTENTS 



Millard H. Black 
Walter H. MacGinitie 

Robert Karlin 
Wayne Otto 

Dale D. Johnson 

Paul Conrad Berg 
Walter H. MacGinitie 
Ronald P. Carver 

Robert L Thorndike 
Mary M. Brittain 

Morton Botel 
John Dawkins 
and 

Alvin Granowskv 
Theodore A. Mork 



Martin H, Jason 
and 

Beatrice Dubnow 



V Foreword 

1 An Introduction to Some Measurement 
Problems in Reading 

8 Evaluation for Diagnostic Teaching 

14 Evaluating Instruments for Assessing i\leeds 
and Growth in Reading 

21 Guidelines for Evaluating Word Attack Skills 
in the Primary Grades 

27 Evaluating Reading Abilities 

35 What Are We Testing? 

44 Reading as Reasoning: Implications for 
Measurement 

57 Dilemmas in Diagnosis 

68 - .Guidelines for Evaluating Classroom 
Organization for Teaching Reading 

77 A Syntactic Complexity Formula 



87 The Ability of Children to Select Reading 
Materials at Their Own Instructional 
Reading Level 

96 The Relationship Between 

Self-Perceptions of Reading Abilities 
and Reading Achievement 



ERIC 



iii 



The International Reading Association attempts, through its 
publications, to provide a forum for a wide spectrum of 
opinion on reading. This policy permits divergent viewpoints 
without assuming the endorsement of the Association. 



iv 



FOREWORD 



The title of ^his publication, Assessment Problems in Reading, 
reflects many of the present concerns and the future tiopes of 
reading instruction. Failures to accurately assess both pupil needs 
and instructional objecli^ are among the causes of educational 
ineffectiveness. Methods which teacher skill in these and 
related areas may be inct ased are matters of administrative and 
legislative concern in many pnt is of this country and the world. 

The need for assessment is present in every important area of 
the instructional program. What are the strengths and weaknesses 
of the pupil, the teacher, the school, and the insti'uctional support 
program? How may the effectiveness of classroom organization be 
increased? How accurately has the instructional level of the pupils 
been determined? How appropriately do the materials of instruc- 
tion reflect the abilities and the needs of the pupils? How do the 
pupils perceive themselves, the teacher, and the educational 
proc~"s? 

Evaluation is in a period of crisis and change. Parents, teachers 
and administrators of public schools, college and university per- 
sonnel, and the cr-i'tics of education in general are questioning the 
validity of time-honored evaluation procedures. What is the impact 
of an evaluation program on the pupils it is designed to serve? Do 
test scores illuminate and guide, or do they obfuscate and con- 
fuse? It is difficult to conceive of effective teaching without 
procedures for determining the skills the pupil possesses, the ways 
in which his needs are similar and dissimilar from those of his 



peers, the degree to which his educational progress parallels that of 
other pupils of his age and grade. 

Similarly, an effective teacher is peiceived as selecting from 
among the wide range of instructional media in terms of the 
established needs of each pupil in his class and choosing those 
materials to meet the needs and to reinforce the strengths of each 
of these young people. 

The Association, is indebted to Walter MacGinitie and the 
authors of Assessment Problems in Reading for the contribution 
they have made in the preparation.pf this publication in order that 
the teaching of reading may be improved through more effective 
assessment of the many aspects of the instructional process. 

Millard H. Black, President 
International Reading Association 

1973-1974 



vi 



Walter H. MacGinitie 



AN INTRODUCTION TO 
SOME MEASJREMENT 
PROBLEMS (N READING 



Teachers College 
Columbia University 



The papers in this volume deal with a wide range of assessment 
problems in reading. The first paper by Karlin introduces the gen- 
eral problem of using assessment procedures to guide teaching: 
setting appropriate objectives for the child and selecting assess- 
ment procedures that will contribute to an understanding of the 
child's current capabilities and of appropriate new goals. The next 
seven papers deal with various aspects of this general problem. 
Otto discusses the distinction between norm-referenced, standard- 
ized achievement tests and criterion-referenced measures, and 
makes clear the distinctive usefulness of the latter. Johnson shows 
how the teacher can prepare his own criterion-referenced evalua- 
tion procedures to fit specific objectives in word attack skills. 
Beyond the decoding stage, it hao been more difficult to develop 
ways to evaluate specific components of reading ability or even to 
identify these components clearly. Berg's paper clearly documents 
this divficulty. MacGinitie points out that the nature of what we 
teach in reading, and therefore the nature of what is tested, 
changes from the lower to the higher grades. Carver takes issue 
with the desirability of such a change and describes what he be- 
lieves to be the undesirable consequences of an emphasis on teach- 
ing and measuring the reasoning aspects of reading in the later 
grades. 

All of the foregoing papers are concerned to some degree with 
differential measurement, that is, discovering whether different 
aspects of reading ability can be distinguished and measured and, 
if they can, how those differential measurements can be used in 



MACGINITIE 



1 



guiding instruction. The paper by Thorndike provides a remark- 
ably clear discussion of the basic. measurement problems inherent 
in trying to learn if a particular child is better at one task than 
another. His clear description of the statistical relations involved, 
and the simple tables that he provides for guiding diagnostic judg- 
ments, should be invaluable both in making diagnoses and in evalu- 
ating tfie usefulness of diagnostic instruments. 

The next two articles turn from measuring student achieve- 
ment to look at the instructional setting and the instructional 
materials. Brittain's discussion and checklist of points to consider 
when evaluating classroom organization will be useful not only to 
the school that is planning such an evaludtion but also to the 
individual teacher who simply wants to think through what he 
would, like to accomplish through organizing his classroom for 
reading instruction. That we can scale the reader's ability suggests 
that we can scale the difficulty of the material he reads, and 
indeed there are many procedures for assessing readability. Most 
readability formulas use sentence length as an estimate of sentence 
complexity. The paper by Botel, Dawkins, and Granowsky offers a 
relatively simple way of analyzing tha actual structures of sen- 
tences to achieve estimates of their complexity. 

The last two papers in this volume consider the relationship 
between the reading ability of the child/ the material he reads, and 
his own assessment of his reading ability. Mork inquires whether 
children can select materials that are appropriate "^or their level of 
reading achievement, and Jason and Dubnow report a study of the 
relation between reading achievement and children's assessments 
of their own reading ability. 

One of the PDOst persistent of the many issues raised by the 
papers in this volume involves the reliability of difference scores. 
Several of the papers stress the importance of diagnostic testing or 
diagnostic judgments. It is important to good teaching to realize 
how fallible such differential test results or judgments ordinarily 
are, so that instructional decisions can be kept appropriately tenta- 
tive. Most readmg skills— especially the more advanced comprehen- 
sion skills— are highly correlated with one another, and only when 
the subskill <^cores are quite different from each other can diag- 
nostic judgments of practical usefulness be made. 

Since 3 teacher's judgments are likely fo be at least as unreli- 
able and as highly intercorrelated as test subscores are, the sober- 



2 



Introduction 



ing message of Thorndike's tables applies in full measure to 
teachers' judgments as well. The teacher should be at least as 
tentative about diagnostic judgments formed from his own obser> 
vations as those formed from test scores and be ready to change 
both appraisal and treatment as new evidence warrants. The refer- 
ences' in several articles to the value of systematic observation,' 
anecdotal records, and teacher-made diagnostic instruments sug- 
gest the need for more training in these skills in teacher education. 

Some of the papers in this volume make clear the advantages . 
that criterion-referenced tests have over norm-referenced tests for 
certain purposes, particularly for gui'ding teaching. It should be- 
understood, however, that criterion-referenced measures are often 
used for making diagnostic judgments and are subject to related 
limitations for that purpose. Giving a. score that refers to some 
criterion rather than to a norm group does not . absolve the test 
maker from showing' that separate component scores index mean- 
ingful skill levels or separately measurable skills. Whether the test 
is criterion-referenced or norm-referenced, the teacher must recog- 
nize that the label of the test is not necessarily a clear guide to 
what the test measures. The problem of subtests that have differ- 
ent labels but' that do not actually measure different skills is ably 
described in the article by Berg. Unless criterion-referenced tests 
clearly demonstrate that they are relevant .to different criteria, 
they are likely to perpetuate the same problem. In evaluating the 
distinctive contributions of criterion-referenced and norm-refer- 
enced tests, it is well to remember that both types are usually 
standardized if) the sense that they are given with standard direc- 
tions and under standard conditions. For example, the first four of 
Otto's Limitations of Standardized Tests can apply to criterion- 
referenced as well as to norm-referenced tests. Finally, it is just as 
important in using criterion-referenced tests as in using norm- 
referenced tests to be sure that the test that is used is appropriate 
to the objectives that are guiding the instnuction. 

Two of the papers make reference to the fact that grade equiv- 
alents obtained from standardized, norm-referenced reading tests 
do not provide a very accurate index of the child's instructional 
level. That a discrepancy exists is quite true, but the reasons for it 
seem not be be generally understood. First of all, the child's in- 
structioiial level, as determined by an informal reading inventory, 
is usually based on some graded series of reading texts or on 



MACGINITIE 



3 



standardized test passages. However, the materials for a particular 
grade level produced by one.publisher may be considerably harder 
or easier than those produced by another publisher, and standard- 
ized test passages used for determining-instru&lional level have no 
inherent priority over other standardized test passages. For these 
reasons the instructional level as determined by one informal read- 
ing inventory may not agree with a second using different mate- 
rials. 

Secondly, the traditional criteria for instructional level (96 
percent correct pronunciation and 60 percent comprehension) are 
quite arbitrary and not always comparable to each other. Further- 
more, the comprehension score depends on the questions asked, 
and it is clear that questions of varying difficulty may be asked 
about the same passage. The arbitrariness of the comprehension 
criterion for instructional level is particularly evident. Why does 
answering 60 percent of the questions about a passage mean that 
passage is appropriate in difficulty for the child to study? And, 
indeed, is the same criterion appropriate at all grade levels? 

Finally, instructional level and the grade score from a test are 
based on opposite regression lines.* For th's reason, adding or 
subtracting a constant will not, as implied by, some recent investi- 
gators, serve to convert different grade scores to corresponding 
instructional levels. The grade score will often give a fair indication 
of instructional level, but the grade score is not defined in such a 
way as to give the best estimate of the level of reading material 
most appropriate for the child. By convention, the grade scores 
from a reading test are based on the average raw scores obtained 
by children at each of several different grade levels. The line 
through the mean raw score points at different grade levels is 
essentially the regression line for the regression of raw score on 
grade level (/). If, on the other hand, one were trying to predict 
the grade level of a child who has received a particular raw score, 

■"The situation is actually more complex than this description indicates, and revising the 
conventional defiriltion of the grade equivalent would not provide a thoroughly satisfac- 
tory answer. A practical solution to the problem is complicated by the fact that raw 
scores should actually be plotted against appropriate level of instructional material, 
rather than the actual grade level of the student, and separate regression solutions should 
• be obtained for students of each actual grade level. The description does not specify 
whether or not the regression is linear, but that is not a relevant consideration for the 
point being made. The line through the mean raw score points at different grade levels is 
usually curvilinear, with smaller and smaller Increments from year to year as grade level 
increases. 



4 



Introduction 



one would be interested in the average grade level of all the chil- 
dren in the normative sample who received that particular raw 
score. If raw scores were assigned grade level equivalents on this 
.latter basis, the assignment would be based on the opposite regres-. 
sion line— the regression of grade levels on raw scores. 

Three of the articles in this collection merit special com- 
ment—the paper by Carver because it represents a novel approach 
to the measurement of reading, and the papers by Mork and by 
Jason and Dubnow because they represent important beginnings in 
promising areas of research. 

As background for understanding Carver's article, it would be 
very helpful for the reader first to read E. L. Thorndike's original 
article, "Reading as Reasoning," reprinted in a recent Reading 
Research Quarterly (2) and Tuinman's perceptive commentary on 
it (jt). Carver's intriguing article was solicited by the Editor, recog- 
nizing that the article would be controversial, but believing that it 
was appropriate to give wider currency to Carver's thought- 
provoking views on the nature of reading instruction and reading 
measurement. Carver attacks the generally accepted concept of 
the nature of reading as exemplified by some of the work of E. L. 
Thorndike and R. L.Thorndike. Since there is no reply to Carver 
contained in thii: volume, the Editor wishes to defend the elder 
Thorndike on one specific point and to suggest very briefly the 
nature of some of the questions that a general defense might ra^^e. 
Carver maintains that E. L. Thorndike was interested primarily in 
studying the decoding of words and the combining of word mean- 
ings into an understanding of the sentence, but objects that Thorn- 
dike's actual work did not.>concentrate on those levels. In view of 
the subtitle ("A Study of Mistakes in Paragraph Reading") of 
Thorndike's major article, it seem'^s'tnapprbpriate to take Thorn- 
dike to task for allowing his interest to range beyond the sentence. 
The types of questions that a defense of the Thorndikian view 
might raise are -exemplified by the following: Is the teacher happy 
with a definition of the reading process .that specifically excludes 
meanings that go beyond those contained in a single sentence? Is 
..the distinction between understanding asentenceand understand- 
ing a paragraph a valid one? If Carver's view is correct, how can 
one explain the high correlations between knowledge of individual 
words on a vocabulary test and scores on a comprehension test? Is 
not reading, as we teach it, intended to be a useful skill so that a 



MACGINITIE 



5 



person who has read something should be able to do something 
that he could not do before (for example, answer a question about 
what he has read)? 

The articles by Mork and by Jason and Dubnow are closely 
related, as they both deal with the question of what the child 
understands about his own reading ability. Mork shows that many 
children in the third and fifth grades are able to select reading 
materials that are appropriate for their level of reading ability. 
Many other children, however, select materials that are consider- 
ably too easy or too difficult, according to a readability analysis of 
the selected material. The basic question is obviously a good one, 
and Mork plans additional studies to investigate how well children 
can actually read the particular materials that they select. The 
possibilities for studying the effects of interest, motivation, and 
specific subject matter are clearly important. 

The paper by Jason and Dubnow is also concerned with chil- 
dren's evaluations of their own reading ability. The authors' under- 
lying concern is how the child's perception of bimself as,a^reader 
influences his growth in reading ability. This initial research clearly 
shows a relation between the child's perception of his reading 
ability and his tested reading achievement. One way of inter- 
preting these results is that a child has a pretty fair idea of how 
well he reads, an interpretation that conforms with Mork's find- 
ings. An alternative possibility, and one that motivates Jason and 
Dubnow's work, is that the child's perception . of himself as a 
reader has actually influenced his development .as" a reader. The 
present study does not allow one to choose between these two 
interpretations, but the problem is an important one and could be 
studied by causing changes either in the perception or the ability. 

The question of how reading self-concept be changed is. in 
itself an interesting basis for research. At the end of th'etr; report, 
the authors draw attention to another very interesting possibility: 
that the child's own oerception of his reading strengths and ^/veak-' 
nesses may be valuable nformation for planning individualized or 
remedial teaching. Does the child have any diagnostic awareness of 
his reading capabilities or only a global evaluation? If he does have 
some sort of diagnostic awareness, what is his taxonomy of the 
reading task? Could children's unstructured* descriptions of their 
own. specific strengths and weaknesses in reading contribute to our 
understanding of the process of learning to read? 



6 



Introduction 



These questions illustrate how a research report can provide 
helpfut answers to one problem and also raise new problems lead- 
ing to new exoiorations that increase our understanding of read- 
ing. The Editor hopes thai each of the articles in this collection 
will serve these two purposes of clarifying some issues and of 
stimulating the study of others. 

References 

7. Gulliksen, H. Theory of Mental Tests. New York: Wiiey, 1950. 

2. Thorndike, Edward L. "Reading as Reasoning: A Study of Mistakes in 

Paragraph Reading/' Reading Research Quarterly, 6 (Summer 1971), 
.425-434. 

3. 1 uinman, J. Jaap 'Thorndike Revisited—Some Facts," Reading Research 

Quarterly. 1 (Fall 1971), 195-202. 



MACGINITIE 



7 



Robert Karlin 

' Queens College 
City University of New York 



EVALUATION FOR 
DIAGNOSTIC TEACHING 



A good reading program is one that develops the basic skills stu- 
dents need in order to read, that teachesjhem how they can use 
reading as a tool for learning, that fosters'an appreciation of litera- 
ture, and that develops permanent interests in reading for enjoy- 
ment. These four characteristics become the objectives of our in- 
structional program and at the same time serve as guidelines for 
evaluating the progress children'make in reading. 

Reading is not a simple skill nor even a single skill. Children do 
not master reading in one or two years just as they do not master 
any other complex activity in a brief period of time. They (earn 
some reading skills and develop some attitudes toward reading as 
they complete one stage of development and move into another. 
What they may be able to accomplish at one point in their reading 
development will not be good enough at another. This fact ex- 
plains why some children can cope with early reading demands but 
not later ones.. It also underscores the need for continuous 
evaluation and orderly reading expe'^iences based upon such 
evaluation. 

We say that children (earn to read; what we really mean is that 
children master the skills and develop the attitudes they need in 
order to acquire the ability to read. Children with reading ability 
draw upon a body of skills that they use to understand and assimi- 
late printed messages. All children do not necessarily use the same 
skills in reading identical materials. Their levels of achievement 
and the nature of the reading task determine which ones they 
apply. Some children are more efficient than others in using their 
skills. 



8 



Diagnostic Teaching 



To fulfill the requirements of diagnostic teaching we may de- 
fine the objectives of reading programs by identifying the areas cn 
whicn teachers need to focus attention. The kind and amount of 
reading growth children achieve is proportional to the degr'ae to 
which teachers manage to translate the objectives into learning 
tasks and guide children in mastering them. Teachers who operate 
within 'this framework will vievy the functions of testing much 
differently from those whose main concern is to grade pupils. 
Thus they do not ask such broad questions as how well children 
identify words and know their meanings, how well they under- 
stand what they read, and how well they read for information. 
Instead, they realize that there are more basic questions for which 
they must seek some answers in order to meet the requirements of 
diagfiostic teaching: Hew well do pupils respond to different types 
of context clues? What pronunciation problems do they meet as 
they use the respellings of the dictionary? How well do they 
identify important ideas when they are stated and when tiiey must 
be inferred? The answers to these and other pertinent questions 
help teachers decide what in the reading curriculum requires 
specific treatments. Moreover, this kind of evaluation suggests 
what types of instructional materials will be required and what 
their levels of difficulty should be. 

Diagnostic teaching benefits cliildren who are making satisfac- 
tory progress in reading. Teachers can anticipate superior results as 
they work'with children who are experiencing difficulties in learn- 
ing to read, if there is a positive relationship between the problems 
and the remedies. Inherent in the concept of diagnostic teaching is 
the idea that evaluation is an ongoing activity as long as instruc- 
tion continues. The teacher forrnulates plans from the information 
he acquires about his pupils, but he knovys that as he teaches he 
will receive new data. It is likely that he will have to modify his 
practices- in order to satisfy the children's current learning needs. 
Occasionally he may have to revise his practices drastically. 

This need for continuing evaluation raises questions about the 
initial effort to obtain information about children's reading. How 
extensive should the analysis be? Should many different tests be 
administered to children before instruction begins? There are dif- 
ferences of opinion about these matters, but it seems reasonable to 
suggest a middle course. Instead of spending many hours testing 
children's reading initially, teachers can take the time to find out 



KARLIN 



9 



where on the reading ladder children are and what some of their 
reading needs appear to be. Al-though this information is incom- 
plete, and possibly somewhat inaccurate, it can be used to plan 
early reading lessons. As teachers work with children they will 
confirm and revise their initial judgments and note new behaviors 
that affect their reading plans. These practices are so much better 
than hit-or-miss, trial-and-error teaching. 

Teachers can appraise children's reading by using standardized 
tests and informal measures. Each form of appraisal provides infor- 
mation that may be useful in assessing what they are doing and 
planning suitable activities for them. 

STANDARDIZED TESTS 

Most schools administer survey or diagn ■)Stic-type reading tests 
which provide general information about students' reading and an 
estimate of their reading achievement. The latter tests are sup- 
posed to identify with greater precision what the reading strengths 
and weaknesses of students are, but such -is not always the case. 
Although standardized reading tests suffer from a number of 
weaknesses, teachers can extract som.e useful information from 
their results. 

Most standardized reading tests yield separate grade placement 
or percentile scores for each section of the test. A wise teacher will 

■ not merely be concerned with the test's total score but will want 
to know how it was obtained. Thus he can determine if the pupils 
are equally strong in all areas, tested or if some pupils are stronger 
in one area than another. Pupils may have the same total score but 
obtain it in different combinations of subtest scores. This first 
analysis may indicate which children need help in one or more 
areas. A more careful examination of the composition of test 
items and the children's responses to them might provide the 
teacher with information about their specific reading require- 
ments. Some reading tests which are presumably diagnostic iden- 
tify the subskills so that teachers can categorize responses. Most 

• reading tests are not sufficiently refined to enable teachers to 
make such an analysis easily, but "more can be learned from the 
children's responses to test jtems than was realized in the past. 
One technique is to sit down with children and go over the test 
items with them. Perhaps they can explain' how they decided on 
their responses. It is possible that even correct responses were 



10 



Diagnostic Teaching 



reached in inappropriate ways or that children guessed many of 
the answers. Teachers may be able to discern patterns of errors by 
comparing similar test items and responses. Standardized reading 
tests suffer from real weaknesses, but the effect of these weak- 
nesses would be lessened if teachers used them with more under- 
standing. 

Dne_ cau tion.Js -particularly— important -in using standardized 
tests for diagnostic teaching: the grade equivalent that a test 
assigns to a child's performance is usually higher than the pub- 
lisher's grade level designation of appropriate instructional mate- 
rials for the same child. Particularly .for the child who is having 
reading difficulties, the most effective reading materials may be 
graded considerably below his grade score on the test. Further- 
more, grade scores at the lower and upper ends within a range of 
'possible scores are not as valid as those which fall in the middle. 
These weaknesses are due to problems of test construction and 
statistical treatments. Tests that cover many grades suffer more 
from this weakness than those intended for one or two. In addi- 
tion, one must take the standard error of measurement into 
•account when interpreting test results. It is better to think of a 
child's achievement as falling within a range of scores than as a 
single score. 

We should recognize that the kinds of reading that tests re- 
quire do not cover all the types of reading that children engage in. 
Tests do not demand the sustained reading children do in school 
and' elsewhere. It is one thing to understand a single paragraph and 
another to react suitably to a longer passage. Children ordinarily 
do not read words in isolation nor do they have to read under 
timed conditions which do not 3II0W for much flexibility. Reading 
tests offer approximations of how well children read; values they 
do not possess should not be ascribed to them. 

Teachers may use standardized reading tests if they understand 
what their limitations are and are able to interpret their results 
adequately. The tests permit us to speak with some objectivity 
about the reading achievement of children. 

TEACHER-MADE TESTS 

Teachers are depending more and more on their own evalua- 
tions of children's reading. This does not mean that they merely 
observe children .read and in haphazard fashion decide what their 



KARLIN 



11 



reading instruction ought to be. Instead, they follow fairly well- 
established procedures to find out how well pupils are reading and 
plan their programs accordingly. 

One form of teacher-made or "informal" reading test does not 
yield a grade-placement score, but it does help teachers identify 
independent, instructional, frustration, and expectancy achieve- 
ment levels. Although there is not complete agreement on the- 
. standards required for each achievement level, teachers will not be 
too far off the mark if they adhere to a reasonable range in which 
they expect children to perform, as well as take into account ob- 
servable reactions as children read orally and silently. 

Teachers may determine from oral reading performances what 
problems pupils have in recognizing words. Some pupils may con- 
sistently omit certain inflectional endings, confuse vowel sounds, 
or fail to utilize roots in unknown words. Patterns of errors might 
be discernible and serve as a basis for planning lessons to overcome 
specific weaknesses. If silent reading is followed by suitable ques- 
tions, pupils' answers will reveal not only how well, they under- 
stand stated ideas but also how deeply they are able to probe, 
ideas. These analyses would be the base for initiating instruction ' 
and continuing the study of reading needs. 

(ess accurate but quick way to estimate a child's reading 
achievement level is to have him read words on a list that samples 
vocabulary from a graded series of books. A separate list of words 
could be prepared from the vocabulary represented in readcirs, 
social studies books, and science books. The primary word lists 
would contain about twenty words each and the' higher-level Hsts 
thirty or more. If the child missed much more than 10 percent of 
the words on any list, that could indicate that the materials are 
'too difficult for him. A. comparison of results from reader and 
subject lists could reveal differences in difficulty between the two. 
Children may have more trouble reading science textbooks'than 
social studies books or readers. ' 

Another way to estimate the difficulty children will have with 
materials is to apply the cloze procedure to two or three typical 
excerpts drawn from it. Every tenth word is removed from each 
excerpt and replaced by a blank of standard length; the reader is 
expected to supply the missing word. If he is able to supply some- 
> what Jess than half of the missing 'Words in the excerpts, he can 
probably comprehend the material sufficiently well to profit from 



12 



Diagnostic Teaching 



instruction in it. [The research relating cloze scores to instruc- 
tional level has been done with cloze tests in which every fifth 
word was deleted (/). To children, however, every fifth word cloze 
tests appear formidable; every tenth word cloze tests seem more 
appropriate for classroom use.] 

A teacher rnay gain some insights into ways children read by 
studying their responses to cloze exercises. Pupils may fail to re- 
late earlier ideas provided by the text to ones offered later, or they 
may become confused by certain sentence structures. Problems 
could surface as the teacher encourages pupils to describe how 
they decided what the missing words were. 

Teachers can prepare individual and group tests to determine 
how well pupils manage specific skills. These tests should contain 
enough items to assure that each skill is adequately sampled. Care 
should be taken that the exercises require the pupils to perform 
the intended skill or demonstrate knowledge of it. These tests 
would need to be prepared in the same way as the others— se.ts for 
each achievement level. 

The aim of diagnostic teaching is to identify growth areas in 
which children are progressing satisfactorily as well as pinpoint 
others to which greater attention should be given. Teaching plans 
are based on children's reading performances and directed toward 
specific learning tasks. Initial appraisal precedes instruction and 
reveals where children are on the reading continuum. Further eval- 
uation accompanies instruction and provides teachers with infor- 
mation they need to make their teaching relevant. 

Reference 

;. Rankin, E. F., and Joseph W. Culhane. "Comparable Cloze and Multiple- 
Choice Comprehension Test Scores," Journat of Reading, 13 (Decem- 
ber 1969), 193-198. 



KARLIN 



13 



Wayne Otto 

University of Wisconsin 
at Madison 



EVALUATING INSTRUI\AENTS 
FOR ASSESSING NEEDS 
AND GROWTH IN READING 



Evaluating an instrument for assessing needs and growth in reading 
amounts to answering two questions. 

1. What do I want to know? 

2. Does this instrument (or technique) do the job? 

Clearly, what one wants to know wiit suggest the approach that 
must be taken;. therefore, the answer to the first question sets the 
stage for answering the second one. 

Here we shall consider three main approaches to assessment: 
standardized achievement tests, criterion-referenced measures, and 
informal procedures. Means for estimating pupils' capacity, al- 
though they are important in assessing needs and growth, will not 
be considered; the discussion is limited to the assessment of read- 
ing behavior. 

STANDARDIZED ACHIEVEMENT TESTS 

In general, the standardized achievement tests are norm refer- 
enced. That is, a given individual's performance is examined in 
relation to the performance of other individuals. The following 
points should be considered in choosing standardized tests. 

1. Define the purpose for testmg. Standardized tests may be 
given for any number of reasons— to compare class achievement 
wi'i, locaf or national norms, to determine the current achieve- 
ment status of classes or individuals in order to learn whether 
corrective or remedial steps should be taken, to screen in order to 
determine the need for further testing, or to evaluate the develop- 
mental program. When the purpose for testing is clearly in mind, a 



14 



Assessing Needs and Growth 



decision can be made as to whether a survey test or an analytical 
test would best suit the purpose. Generally, survey tests-are group 
tests designed to provide a score that will tell the teacher how well 
■ a class or a pupil compares with other pupils of the same age and 
grade. Survey tests are typically used at the survey level of diag- 
nosis. Analytical tests may be either group or individual tests. 
They are designed to break, down, the total reading performance 
into specific strengths or weaknesses. Group tests have the obvious 
advantage of testing more pupils in less time than individual tests, 
but the latter are likely to provide much more information regard- 
ing the idiosyncracies of an individual's performance. 

2. Locate suitable tests. From among the many tests currently 
available, several will typically appear to be appropriate for the 
purpose identified. Probably the most useful single source of 
assistance in locating and sorting out suitable tests is The Sixth 
Mental Measurements Yearbook, edited by Oscar K. Buros. (Previ- 
ous editions were published in 1938, I94O, 1949, 1953, and 1959.) 
Available tests in education and psychology are listed and de- 
scribed in the yearbook. Brief descriptions of such things as cost, 
coverage, and source, as well as. one or more criti^^al review j, are 

. included for each test. 

3. Evaluate before selecting. Once the tests that appear to 
meet the requirements of a given situation have been identified, 
they should be carefully evaluated in terms o*" such things as reli- 
ability, validity, economy, ease of administration, adequacy of the 
manual, relevance of the norms provided, and appropriateness of 
the content for local pupils.* A test that \% reliable yields consis- 
tent results. A test that is valid actually measures-what it is sup- 
posed to measure. The validity of a test can be estimated by 
correlating individual scores on. the test with performance on a 
previously selected criterion task or test. The fact remains, of 
course, that many highly regarded, widely used tests have only 
face validity. That is, they appear to measure what they are in- 
tended to measure. 

An- adequate test manual includes the following kinds of infor- 
mation: 1) Clear and concise directions for administering the test. 
This is important because a major reason for using a standardized 

• *For an extended discussion of things to consider in selecting tests consult Standards for 
Educational and Psychological Tests and Manuals, American Psychologica! Association, 
'1200 Seventeenth Street, N.W., Washington, D.C. 20036. 



OTTO 



15 



~T^Ms to secure data under stated conditions. 2) Adequate infor- 
mation regarding the reliability and validity of the test. .3) Norms 
based upon sound sampling procedures. That is, the sampling of 
scores upon which the norms are based should be large and distrib- 
uted according to geographic location and socioeconomic arens, 4) 
Aids for interpretation. Provision for profile analysis and illustra- 
tive interpretations are useful. 

Finally, a test must be readily and currently available if it is to 
be used in quantity. It should be economical: such things as initial 
cost of test booklets, whether the booklets are reusable, ease of 
scoring, and compatibility with machine scoring techniques must 
be considered. A test that is very reasonable in terms of initial cost 
could be prohibitively expensive in terms of time required for 
scoring or replacement costs. Availability of alternate forms is 
required if the test is to be used in test-retest comparisons. 

The best way to becom*^. completely familiar with a testjs to 
take the test yourself and then to administer it to a few children. 
There is no better way to learn about problems in administration 
and scoring and the appropriateness of the content. Specimen sets 
of tests are readily available from publishers at a reasonable cost. 

LIMITATIONS OF STANDARDIZED TESTS 

Standardized tests share some rather severe limitations that 
ought to be kept in mind even after the "best available" test is 
chosen. Some of the more salient limitations are given here. 

1. The very fact that a "test is "standardized" in terms of 
administration and scoring may make it inappropriate for use with 
certain groups or individuals. The test may be too difficult or too 
easy; items may be meaningless or placed at inappropriate levels; 
directions may be incomprehensible. 

2. The test maker's quest for brevity, which unfortunately but 
pragmatically enhances the salability of tests infsome circles, may 
result in unrealistic time limits- and a choice between depth and 
breadth in sampling. Scores of children who work very slowly but 
accurately are likely to be meaningless; the sampling of behavior is 
likely to be superficial ar constricted. 

3. Group administration may work to the disadvantage of cer- 
tain individuals. The group situation combined with the standard- 
ized conditions may invalidate the test in some instances. F^r 
example, a child who fails to understand one or two words in a set 



16 



Assessing Needs and Growth 



of directions may be unable to respond to any of the items, which 
he may or may not have known. • . 

4. The format of the test may restrict the type of items used. 
A machine scorable format, for example, virtually demands some 
form of multiple-choice items. Certain behaviors are not ade- 
quately sampled with multiple-choice items. 

5. Tests at upper grade levels assume ability at lower levels. 
Thus, a pufDil may be able to score at a certain base level by simply 
signing his name to the test booklet. Furthermore, it is generally 
acknowledged that standardized tests do^ tend to yield over- 
estimates of appropriate instructional leva'.. 

CRITERION-REFERENCED MEASURES 

Criterion-referenced measurement relates test performance to 
absolute standards, usually stated in terms of behavioral objec- 
tives, rather than to the performance of other students. Such 
measures are most useful for assessing pupils' mastery of specified 
objectives. Some of ti^e salient contrasts between norm-referenced 
(standardized achievement) tests and criterion-referenced measures 
follow. 

1. Str.ndardized tests have a low degree of overlap with the 
oojectives of instruction at any given time and place. The 
overlap for criterion-referenced measures is absolute, for 
the objectives of instruction are the referents. 

2. Norm-referenced tests are not very useful as aids in plan- 
ning instruction because of the low overlap just men- 
tioned. Criterion-referenced measures can be used directly 
to assess the strengths and weaknesses of individuals with 
regard to instructional objectives. 

3. Again because of their nonspecificity, norm-referenced 
tests often require skills or aptitudes that may be influ- 
enced only to a limited extent by experiences in the class- 
room. This cannot be so for criterion-referenced measures 
because the referent for each test is also the referent for 
instruction. 

4. Standardized tests do not indicate the extent to which 
individuals or groups of students have mastered the spec- 
trum of instructional objectives. Again, there, is no such 
problem with criterion-referenced measures because they 



OTTO 



1-7 



focus on the spectrum of instructional objectives in a given 
situation. - 

The main advantage of criterion-referenced measures is that 
they get directly at the performance of individuals with regard to 
specified Instructional objectives. The sensible management of a 
system of individualized instruction requires knowledge of each 
pupil's performance with regard to the ot 'jctives of the system. 
Such knowledge can be derived from criterion-referenced meas- 
ures. 

LIMITATIONS OF CRITERION-REFERENCED MEASURES 

There are of course some dangers and pitfalls in criterion-refer- 
enced mto.surement. 

1. Objectives involving hard-to-measure qualities, such as 
appreciation or attitudes, may be slighted. 

2. Objectives involving the retention and transfer of what is 
•learned" may become secondary to the one-time demon- 
stration of mastery of stated objectives. 

3. Specifying the universe of tasks (determining critical in- 
structional objectives) to be dealt with is of exti^eme im- 
portance. Good tests will do nothing to overcome the 
problem of bad objectives. But note that the problem here 
is- no different for norm-referenced testing. 

4. Determining proficiency standards can be troublesome. 
Perfect or near-perfect performance should be required if 
a) the criterion objective calls for mastery, b) the skill is 
important for future learning, c)" items are objective type 
and guessing is likely. Less demanding performance may be 
adequate if any of the three conditions do not prevail. 

Fortunately, one does not need to choose between norm-refer- 
enced and criterion-referenced measures. To the contrary, the two 
types of measures ought to complement each other, with .each 
type chosen according to the purpose for testing. 

.At the present time the biggest problem with criterion-refer- 
enced testing may be to find such test?; to consider. Certainty the 
movement toward criterion-referenced testing is in its infancy 
compared to norm-referenced testing. As a consequence, once one 
has decided to take an objectives-centered approach to instruction, 
he may be confronted with the task of devising his own criterion- 
referenced measures as discussed in the section that follows. While 



18 



Assessing Needs and Growth 



such a task may at first seem overwhelming, it may turn out to be 
a good thing if it causes one lO look a bit more carefully at what 
one is doing and how to assess it. 

INFORMAL PROCEDURES 

In the process of diagnosis a teacher will often find it neces- 
sary to seek information that is not available from existing tests or 
to supplement, information from them. When this is so, it is up to 
him to devise his own informal measuring device. Since in many 
instances the teacher will want to know whether particular stu- 
dents know how to do a particular task, the measurement, though 
informal, is likely to be criterion-referenced. The following 
sequence can serve as a guide to the effective use of informal 
assessment: First, decide exactly what information is desired and 
what this means in terms of observable behavior; then devise new 
or adapt existing test items, materials, or situations to sample the 
behavior to he evaluated; keep a record of the behavior evoked in 
the test situation; analyze the information obtained; and finally, 
make a judgment as to how the information fits the total picture 
and how well it fills the gap for which it was intended. 

Examples of some of the most useful and most used informal 
devices for gathering diagnostic information, particularly regarding 
strengths and weaknesses in specific skill developm.ent, follow: 

1. Informal observation. The most naturalistic informal tech- 
nique for gathering diagnostic information is informal observation 
of the pupil. This technique is often overlooked; but it is one that 
alert, skillful teachers can use effectively for a number of pur- 
poses—systematically observing a child's overall performance, 
learning about his interests and attitudes, finding out about his 
approaches to problem solving and to study situations, and de- 
tecting physical problems and limitations. Observing with a pur- 
pose can provide the teacher with real insight into the problems a 
child may be encountering when he attempts to follow through 
story problems in arithmetic, attack new words, or write legibly. 

2. Anecdotal records. In its simplest form, an anecdotal rec- 
ord can consist of a manila folder in which word samples and 
observations are kept in chronological order. The primary purpose 
for keeping such a record is to help the teacher keep in mind the 
developing characteristics of a child. Gradual but steady improve- 
ment may be seen as lack of improvement if there are no readily 



OTTO 



19 



avaifable checkpoints. Obviously, the record loses its value If it is 
simply cluttered with an occasional' drawing and general state- 
ments (ike, "Clyde appears to be doing better." Entries must be 
dated. 

3. Informal tests. Many of the books, workbooks, and periodi- 
cals designed for school use include informal, nonstandardized 
tests that can be used for quick checks of pupils' comprehension, 
writing ability, grasp of arithmetic concepts, and the like. Similar 
informal tests can be constructed by the teacher to check on 
pupils' grasp of just-presented material or to get samplings of vari- 
ous kinds of behavior. 

4. Checklists, In this general category are included such things 
as interest and personality inventories; questionnaires of -work 
habits, interests, activities, associates; and lists of specific skills 
that can be used to check a pupil's mastery of certain areas. The 
lists are a practical means for systematizing observations, 

5. Informal reading inventories. In the area of reading, niany 
teachers use an informal reading inventory to observe a pupil's oral 
and silent reading at several difficulty levels. The inventory con- 
sists of samples from the various grade levels of a basal reader 
series plus comprehension questions. Four levels of reading ability 
are typically identified through the use of the inventory: inde- 
pendent level— the level at which the pupil can read independently 
with at least 99' percent accuracy in word recognition and 90 
percent or 'better comprehension; b) instructional level— the level 
at which the pupil can read with some help from the teacher; c) 
frustration level— the level at which the pupil can no longer func- 
tion effectively; and d) listening capacity level— the highest level at 
which the pupil can comprehend at least 75 percent of material 
that is read to him. 

Each of the informal devices listed can be adapted in a number 
of ways to increase its applicability. Note that all of the informal 
procedures discussed lend themselves very well to criterion-refer- 
enced measurement. Once criteiion behaviors have been identified, 
they can be sampled with paper-and-pencil tests or through in- 
formal procedures. 



20 



Assessing Needs and Growth 



Dale D. Johnson 
University of Wisconsin 
at Madison 



GUIDELINES FOR EVALUATING 
WORD ATTACK SKILLS 
IN THE PRIMARY GRADES 



The terms word attack, word analysis, word recognition , and 
decoding are often used synonymously in reference to a cluster of 
rather diverse skills that readers employ to identify words they do 
not recognize in print. Six or seven skills are commonly described 
in reading methods textbooks within their chapters on word 
attack. These skills are: configuration (sight words), picture clues, 
phonics, syllabication, structural analysis, context, and use of the 
dictionary. 

In the writer's opinion, only three of these usual six or seven 
skills are truly word attack skills that are useful to children. Con- 
siderable evidence shows that instruction in configuration— word 
shape— (drawing little boxes around words) is probably a waste of 
everyone's time (2). The procedure is rarely used beyond the ini- 
tial weeks of first grade, and even then does little to help a child 
form generalizations that will be useful later. Likewise, dictionary 
skills— worthwhile as they are— should be treated as reference, not 
word attack skills. Picture clues are in essence no more than con- 
text clues, whereby information is gleaned from graphic or pic- 
torial context, rather than through syntactic or semantic clues. 
Finally, it seems to me that syllabication should be treated as a 
subdivision of phonics when pronunciation generalizations are 
used, and as the basis for structural analysis when morphemic 
clues are used. The key word attack skills, then, and those that 
will be the basis for the remaining discussion, are phonics, struc- 
tural analysis, and context. 

Research has shown that children typically enter their first 



JOHNSON 



21 



grade classrooms with an oral/aural vocabulary of several thousand 
words. On the other hand, most children entering a beginning 
reading program cannot read more than a handful o^ words. Thus, 
the purpose of instruction in word attack is clear It is based on 
the assumption that many words that are unfamiliar. to a child in 
print are, nevertheless, words that he can use or understand in 
conversation. Therefore, word attack skills should enable a child 
to bridge the gap from unfamiliarity in print to the meaning that 
he already attaches to a word in his listening and speaking vocabu- 
lary. This purpose is particularly true of two word attack skills; 
phonics and structural analysis. The third skill, use of cpniextual 
clues, is often a vocabulary building skill as well. 

The overall goal of facility in word attack is an ever enlarging 
sight word vocabulary— a vocabulary of words recognized instantly 
in print. Smith (3) estimates that adult readers have a sight vocab- 
ulary of between 20,000 and 100,000 words. Obviously, mature 
readers did not learn each of these as a distinct sight word. Rather, 
use was made of a variety of word attack skills. Teachers of read- 
ing are challenged with helping children develop those skills that 
they will use to increase their sight vocabularies from a few words 
to tens of thousands. 

With this rationale for teaching word attack, the remainder of 
' ■ present discussion will be directed to the assessment of chil- 
dren's acquisition of word attack skills. Four guidelines for evalu- 
ating word attack ability will be presented. These guidelines are 
based on two beliefs: 1) Skill in word attack is essential for devel- 
oping readers and 2) the key word attack skills can be measured. 

Four guidelines for evaluating word attack will be discussed: 

1, Skill in word attack should be measured through teacher- 
made or published tests that use synthetic (or nonsense) 
words. 

2, Skill in word attack can be adequately measured through 
group-administered tests. 

3, Word attack tests should measure decoding not encoding 
skills. 

4, Word attack skills should be evaluated often in the primary 
grades so that programs can be geared to the needs of 
pupils. 



22 



i/\/ord Attack Ski//s - 



USE OF SYNTHETIC WORDS 

I have suggested that word attack tests should use synthetic 
words. The rationale for this view is that unless synthetic words 
are used, teachers can ne9er be sure whether they are measuring 
the specific word attack skill in question or are simply measuring 
words that mey be in a child's sight vocabulary. 

The purpose of phonics, very simply, is to help a child pro- 
nounce a word he doesn't recognize in print, with the reasonable 
assumption that once pronounced, the word- may be recognized 
from the child's oral/aural vocabulary. How can phonics generali- 
zations be tested using synthetic words? If we are evaluating chil- 
dren's use of the "hard and softc generalization," for example, it 
seems much more reasonable to use synthetic words such as ceb, 
cack, cobe, and cipe than such words as cent, cat, coat, and city. If 
' the child pronounces the latter four correctly, can we really be 
sure he has mastered thee generalization? 

In terms of structural analysis, such synthetic wotds as ungate, 
meatness, and footbank will more accurately assess a child's 
knowledge and use of prefixes, suffixes, and rootwords, than 
would real words such as unhappy, happiness, or stoplight. We can 
be sure children have not seen the synthetic words— thus must 
attend to base words and affixes— whereas- with the real words we 
may merely be determining whether or not these words are within 
the child's sight vocabulary. 

Synthetic words are also useful when measuring ths ability to 
use contextual ciues. For example, the use of the word cromp in 
the sentence, "He received a new cromp with silver handlebars, 
purple pedals, and a chrome-plated chain," can provide a good, 
indication of the child's attention to context in understanding an 
unfamiliar word. Again, where a real vs/ord {bicycle) is used, 
uncertainly arises as to what is being measured. 

I tend to favor teacher-made tests of word analysis, rather 
than published tests, because the tests can be constructed to meas- 
ure the specific word attack skills of interest. If we are interested 
in evaluating our success in teaching a particular skill, national 
norms are not needed. Tests can sometimes be very short, contain- 
ing four or five items' to assess a generalization or subskill. 

GROUP ADMINISTERED TESTS 

Educators are well aware of the many advantages of individ- 



JOHNSON 



23 



ually administered tests and also know thoir principal short- 
coming—they take time. I would rather see teachers spending their 
time on instruction than on measurement. But, the instruction 
must be based on diagnosis. Word attack tests in phonics, struc- 
tural analysis, 'and context clues lend themselves particularly well 
to group procedures. 

With a small corpus of sight words to be used as distractors, all 
important phonics generalizations can be assessed with multiple- 
choice tests. A few sample items constructed to measure long and 
short vowel generalizations might look like the following: 

"Circle the real word whose underlined letter sounds the same 
as the underlined letter in the word at the left." 

bamp apple game dare 

dape dare game apple 

rad apple dare game 

bame game apple dare 

To assess the child's use of structural analysis, children can be 
asked to divide between prefixes, base words, and suffixes in such 
words as the following: 

prehead doorest eatroom 

Multiple choice items can be used to evaluate the child's ability to 
use context in defining an unknown (synthetic) word. For exam- 
ple, in the sentence used earlier, the word in question wascromp. 
Children could be asked: "A cromp is a ... a. bird 

b. bicycle 

c. teacher 

Once a group test of the particular word attack skills of inter- 
est to the teacher has been constructed, administered, and scored, 
the teacher may want to test a few children individually— those 
who seem to have had the greatest difficulties. But a great deal of 
diagnostic information can be gained, and time saved, through 
using group administered tests. 

DECODING, NOT ENCODING 

One of the major problems with many word attack tests, 
particularly tests of phonics ability, is that they involve encoding, 



24 



Word Attack Skills 



or spelling, rather than decoding, or pronouncing. The sets of 
grapheme-phoneme correspondences for encoding and decoding 
are often quite different. For example, if you were asked to write 
iKodI on your paper, you would have two choices for the initial 
consonant, coth or koth (coth would be proper because /K/ is 
spelled c in initial position except in borrowed words) and several 
choices for the medial vowel, coth (scoff), cauth (cause), or coath 
(broad). On the other hand, if you were shown th? word cof/? and' 
asked to pronounce it, other choices are available: /ka0/ (mop), 
/Koi9/ (both), /Ko6/ (moth), or /K^O/ (mother). The point is clear: 
decoding and encoding correspondences are not always bidirec- 
tional. Therefore, tests in which the examiner reads i syntlietic 
word and the children are asked to respond, either from aniong^, 
choices or in writing, are not accurately measuring word attack 
decoding skills. The example items p.esented on the preceding 
page are based on decoding and should be the type used in 
reading. Read (7) found tha^ young children could write (encode) 
a number of words but later could not pronounce (decode) their 
own spellings. 

There .is nothing wrong with testing encoding if one is inter- 
ested in spelling ability, but testing should fit the purpose of the 
instruction. To measure childrens' ability to use phonics generali- 
zations in pronouncing unfamiliar printed words, tests requiring 
decoding shoulc" be used. Failure to do so could cause teachers to 
plan instructional programs vvhich do not match the skill needs of 
their pupils. 

- i 

FREQUENT EVALL?ATION 

Last, but certainly not least, it seems imp'jrative that children's 
progress in the development of Vv'ord attack skill be evaluated 
regularly and frequently. Surely most primary grsde children will 
experience word attack instruction every week. Boginning early in 
grade one, instruction in invariant consonants and regular vowel 
patterns will be underway. Later, generalizations regarding variant 
consonants, consonant clusters, vowel clusters, and syllabication 
will be introduced. By second grade, children should be developing 
their use of structural analysis and contextual analysis. For a 
sound developmental reading program to flourish, it will be essen- 
tial that word attack skills be assessed often. Ideally, assessment 
should be done after each phonics generalization, structural analy- 



JOHNSON 



25 



sis clue (base words and affixes), and contextual strategy (picto- 
rial, semantic, and syntactic), has been taught. Frequent, planned 
assessment or the word attack skills will enable the teacher to 
design needed instructional activities geared to the individual char- 
acteristics of the pupils. 

SUMMARY 

These four guidelines— 1) use synthetic words, 2) construct 
group tests, 3) test decoding not encoding abilities, and 4) evaluate 
frequently — should provide a framework forcontinuing evaluation 
of the essential word attack skills. Word attack tests should be 
geared to the specific skills being taught. Word attack tests are not 
difficult to construct and can provide valuaole information about 
the degree of success the word attack program is having. Testing 
should be dona often, at least weekly 

Word attack skills are essential for children developing their 
reading ability.- The wise teacher will continue to evaluate these 
skills so that the instructional program can be most fruitful. It is 
hoped that the guidelines suggested tiere will contribute to that 
end. 

References 

/. Read, Charlej. "Pre-School Children's Knowledge of English Phonology," 

Harvard Educational Review, A\ (February 1971), 1-34. 
Z Samuels, Jay S. "Modes of Word Recognition," in Harry Singer and 

Robert B. Ruddell (Eds.), Theoretical Models and Processes of Reading. 

Newark, Delaware: /nternational Read/ng Association, 1970,-23-37. 
3. Smith, Frank. Understanding Reading. New York: Holt, Rinehart and 

Winston, 1971, 146-148. 



26 



Word Attack Ski//s 



Paul Conrad Berg 
University of South Carolina 



EVALUATING READING ABILITIES 



Professionals in measurement tell us that if a thing exists, it can.be 
measured. Reading specialists have been measuring and evaluating 
bits and pieces of reading abilities ever since William S Gray first 
published The Gray Standardized Oral Reading Paragraphs in 
1915. Also in 1915, Starch reported a silent reading test that he 
had devised, and with it he postulated the chief elements of read- 
ing to be comprehension, speed, and pronunciation (19). By 1921, 
Gray had effectively stated the case for silent reading, and many 
silent reading tests were being published, using Starch's postulated 
factors. Thorndike, in 1917, added the first recorded study of 
reading as reasoning to this growing area for research (21). 

The idea of separate, definable skills grew so rapidly that by' 
1945, Burkart [4] reported that her survey of the literature on 
reading instruction indicated that "Reading is not a single act but 
a complex activity made up of at least 21 4 separate abilities. T hese 
abilities are motor, sensory, or intellectual in nature." - 

Today, some fifty years after the first studies in evaluation by 
Gray and Starch, a review of reading tests turns up well over 70 
reading abilities t>.at publishers tacitly infer are defensible as sepa- 
ra^o factors. Buros' Reading Tests and Reviews (5), published in 
1968, contains some 500 pages devoted to the task of describing 
and evaluating published reading tests, and the latest- edition of 
Buros' The Seventti Mental IVIeasurements Yearboo/c {6) adds a 
hundred more pages of'technical data on reading tests. 

.The purpose behind testing and evaluation was (and is) net 
simply to list the reading abilities of the single student or groups 



BERG 



27 



of readers. Through testing, remedial reading techniques developed 
and individualized instruction came into being. The overall im-- 
provement in the teaching of reading also war. promoted by this 
evaluation movement. 

With literally dozens of reading skills hypothesized as measur- 
able, we have seen an astonishing outpouring of workbooks, kits, 
"and visual aids of every description that purport to teach the skills 
as measured by the reading tests. But what if the separate skills 
that we claim to teach, such as retention of details, ability to 
determine the intent of the writer, ability to grasp the general 
idea, and on ad infinitum do not in reality exist as separate measur- 
able factors, at least as measured by our present reading tests? 
What would this suggest, in effect, about the materials that are 
spficifically created to improve these separate skills? 

Before a reading .test nan claim to measure some particular 
ability in reading, the existence of that ability must first be 
demonstrated by an appropriate statistical analysis. What have 
such analyses indicated? One of the first attempts at a factor 
analysis of reading comprehension ability was made by Traxler 
(23) in 1941. He sought to discover whether or not the separate 
parts of the VanWagenen-Dvorak Diagnostic Examination of Silent 
Reading Abilities did indeed yield "measures which are independ- 
ent enough to warrant their separate measurement and use as a 
basis for diagnostic and remedial work." After administering the 
test to 1 16 tenth grade students, Traxler stated that the five sec- 
tions of the test appeared to be measuring the same abilities, and 
doubted that the separate scores contributed anything over the 
total reading level score. 

Also in 1 944, a factorial study of reading abilities was made by 
Davis (7). Of the nine variables hypothesized by Davis, five were 
found to meet his criteria for stability and order as variables. 
These factors were knowledge of word meanings, verbal reasoning, 
sensitivity to implications, following the structure of a passage, 
and recognizing the literary techniques of the writer. In 1946 
Thurstone (22) reanalyzed Davis' data and concluded that Davis 
had no statistical ground for his claim, but that there was only a 
single general factor comprising reading ability. Hall and Robinson 
(10), in a 1945 factorial study, identified attitude of comprehen- 
sion accuracy, rate of inductive reading, word meaning, rate for 
reading unrelated facts, and chart reading skill as separate factors. 



28 



Reading Abiiities 



Stolurow and Newman [20) in 1959 identified oniy semantic 
difflcuity (words) and syntactical difficulty (sentences) as factors 
determining the reading difficulty level of passages. 

Hunt (7 7) and Aishan (7) also attempted factor analyses of 
reading comprehension, using test items from the Davis studies to 
build their research instruments. Hunt concluded from h.is results 
that only two skills in reading comprehension were factoriaily 
defensible—word knowledge and paragraph comprehension. 
Alshan was also unable to substantiate the hypothesis that five 
different skills of comprehension were independently factorable 
from the Davis iter . - 

Later, Davis (5) again attempted a statistical analysis' of read- 
ing comprehension, using a much improved technique of cross- 
validation uniqueness analysi's based on 'a' sample of .1.100 high 
school seniors in the Philadelphia area. Eight separate skills were 
selected for study after an analysis of previous research, including 
the Davis, Hunt, and Alshan studies. Five skills were found to 
show a significant degree of independence. The skill making up the 
largest percentage of variance was "memory for word meanings" 
with 32 percent of the total variance of the eight variables. Next, 
in order, were drawing inferences from content (20 percent of the 
variance); following the structure of a passage (14 percent of the 
variance); recognizing a writer's purpose, attitude tone, and mood 
(11 percent of' the ^variance); and finding answers to questions 
asked explicitly or iii paraphrase (10 percent of the variance). 

In 1969, Schreiner, Hieronymus, and Forsyth (75) reported a 
carefully conducted experiment on the reading comprehension of 
fifth grade pupils in nine Iowa public schools. The purpose of the 
study was to provide classroom teachers with information relative 
to what traits of comprehension are measurable so that useful 
diagnostic tests could be provided. The eight factors investigated 
were: speed of noting details, speed of reading, paragraph meaning, 
determining cause and effect, reading for inferences, selecting the 
main idea, verbal reasoning, and listening comprehension. Only 
four factors were found to be statistically definable for diagnostic 
purposes: speed of reading, listening comprehension, verbal rea- 
soning (classification of wo; ds), and speed of noting details. 

Benz and Rosem.ior [3) in 1968 reported a study that at- 
tempted to equate certain word analysis skills with comprehen- 
'Jion. Using the Gates Level of Comprehension Test as the criterion 



ERIC 



BERG 



29 



and the subtests of the Bond, Clymer, and Hoyt Silent Reading 
Diagnostic Tests as the predictor variables, they found that the 
subtests having the greatest relationship to the criterion were 
words in context, rhyming sounds, and syllabication. Subtests 
having low statistical relationship ',o comprehension were the roof 
word, word elements, and beginning sounds. 

There are nnany more studies in the Literature that add to the 
same generalization: there are few consistent findings relative to a 
large number of statistically identifiable separate reading abilities. 
This review also suggests that research in measurement and class- 
room practice in measurement have little in common. If one were 
to take a rough average of the number of factors that researchers 
suggest can be measured independently, one would come up with 
a number somewhere between two and five. Lennon {14), writing 
on the same subject in 1962, suggested that oniy four factors 
could be measured reliably: 1) a general verba! factor, 2) compre- 
hension of explicitly stated material, 3) comprehension of implicit 
or latent meaning, and 4) appreciation. While several studies sub- 
sequent to Lennon's review have been discussed, four factors 
might stiJI be close to the number measurable. Yet, as already 
stated, a review of reading tests turns up 70 or 80 factors that 
various tests implicitly claim to measure. 

Even though the results from standardized tests of comprehen- 
sion may not measure separately the several skills they -claim to 
measure, it is possible that from a. pragmatic or practical point of 
view such tests may have some value for reading instruction. Obvi- 
ously, teachers do not teach "pure" skills in isolation any more 
than tests can measure them. Therefore; it is possible that the 
teacher who gives the tests gets from the data the kind of informa- 
tion that is needed to improve instruction, even though neither the 
test results nor his teaching deal with precisely defined or meas- 
ured skills. The question of evaluation effectiveness is really mean- 
ingless, however, unless we see what effect evaluation has on the 
instructional materials and practices that are, in part at least, an 
outcome of differential testing. That is, does testing make a differ- 
ence foi" instruction in any way, either by making the materials 
more focussed and effective or by significantly changing in- 
struction? 

It seems evident that the same subjective rationalization that 
helped produce our measurement techniques is also responsible for 



30 



Reading Abilities 



•the methods and materials that we, use in ti^^aching reading compre- 
hension. Some researchers have stated that methods and materials, 
for teaching reading are no more scientific than \he a priori pro- 
.nouncements prior to the research of this period. F or example, in 
1941, Laycock and Tiussell (73) reported that an analysis of read- 
ing improvement manuals revealed that few of tr^^^n had any basis 
in research findings for their suggested reading improvement. 
While the findings of this early study are perhaps not surprising, in 
1950 Robinson (76) made the same charge when he stated "no 
.particular professional acuity is required to penetrate the super- 
ficiality of the types of exercises and treatment that characterize 
most of these volumes." Atwater (2). in 1968, studied eleven popu- 
lar reading improvement workbooks used at the college level to 
determine if the skills they claimed to improve were actually de- 
fined, and- secondly, what aspects of the defined skills were actu- 
ally taught and measured in the workbook exercises. He found, for 
example, that definitions of comprehension covered the range of. 
ambiguity from simply the word "understanding" in one work- 
book, to "an ability to grasp the author's thought structure as an 
organized whole" in- another. From 80 to 95 percent of the exer- 
cises in the workbooks, including questions, dealt with factual, 
detailed information. One workbook, for example, claimed that 
comprehension included knowledge of structure and style of writ- 
ing. The one question in the workbook that was meant to measure 
this skill was "how many paragraphs are in the preceding article?" 

And yet teaching, to be successful, must be directed, A teacher 
must know what h is pupils can and cannot do in terms of common 
behavioral objectives in reading; Overreliance on published tests 
can create a false sense of having information that indeed one does 
not have. Evaluation is much more than testing— it must include a 
variety of observations. One important type of observation is 
guided by teachers' daily questions. Skillful questioning by 
teachers is not only an art of evaluation, but also a part of good 
teaching, A facet of this function is learning the art of asking 
skillful questions and leading the student to develop a questioning 
attitude about everything that he does. How skilled are teachers in 
this characteristic? Floyd [9) studied the verbal activity in the 
classes of 40 teachers selected from administrative ratings as the 
best teachers in a city school system. He recorded a significant 
amount of verbal activity in these teacher's classes and separated 



ERIC 



BERG 



31 



out for analysis that verbal activity dealing with teacher-student 
questions. Of all the questions asked, 96 percent were asked by 
the teachers, only 4 percent by students. Only 5 percent of the 
teachers' questions— one in 20— demanded a thought answer or 
seemed capable of creating any stimulation or reflection on the 
part of the student. Eighty-five percent of the teacher-initiated 
questions fell into only two categories— memory for facts and in- 
formation. Almost all could be answered by "yes" or "no" or 
simple repetition of a stated fact. Questions dealing with problem' 
solving, the student's interests, or for helping to locate student 
problem areas in learning Were almost, never asked. In another 
study (75), 190 teachers were asked to list as many reasons or 
purposes as they could think of for asking questions. Fewer than, 
thrte reasons were given per teacher for asking questions. Only 19 
of the 190 went beyond the need to ask simple, factual questions. 
Only 10 percent said that teacher questions should require pupils 
to use their facts to make generalizations and inferenr.es. Satlow 
(77), by-comparison, lists 120 reasons for asking questions. These 
are just a few of his suggestions: "Do you challenge students by 
asking questions that arouse their curiosity for further knowledge? 
Do your questions stimulate thifiking on the part of students and 
help to develop in them effective methods of attack? Do they help 
guide wholesome interaction among the students? Do your ques- 
tions disclose the degree to which a spirit of inquiry has been 
established? ' Do they place the burden of thinking on the 
students?" 

SUMMARY 

Knowing about how our students learn is more than an evalua- 
tion of a compilation of scores from a series of standardized tests. 
Such tests do give us information for instruction that would be 
difficult or time consuming to get otherwise. But to complete the 
picture of a student's learning pattern, become skilled in the in- 
formal, observational inventory Vochnique that makes you a diag- 
nostic teacher— the best kind that there is. Kress (72) summarizes 
this conclusion, describing daily, diagnostic teaching techniques 
under the headings of general observation, observation of listening 
situations, speaking situations, and reading situations. As the child 
listens, for example, can he follow directions? Can he "picture" 



32 



Reading Abilities 



things described rn words? In speaking situations does he use past 
experiences, logical argument, supporting evidence? Is he consist- 
ent or inconsistent? In reading situations, there are a nnultitude of 
observations that can be made, such as, can he find a fact or idea 
by skimming? Does he try to size up organization? Does he use 
aids, such as graphs and charts? 

And so we have made the full circle, back to the teacher as the 
one who' makes the difference. The summation of excellence has 
not changed for two thousand years. Tests and materials cannot 
duplicate teacher excellence or substitute for it. Through it the 
human equation remains the master. 

References 

Alshan, L. M. "A Factor-Anatytic Study of the Items Used in the Meas- 
urement of Some Fundamental Factors of Reading Comprehension," 
unpublished doctoral dissertation. Teachers College, Columbia Univer- 
sity, 1964. 

2. Atwater, J. "Toward Meaningful Measurement," Journal of Reading, 11 

(March 1968), 429-434. 

3. Beni', Donald A., and Robert A. Rosemier. "Word Analysis and Compre- 

hension." Reading Teacher, 21 (March 1968), 558-563. 

4. Burkart, Kathryn U. "An Analysis of Reading Abilities," Journal of 

Educational Research, 38 (February 1945), 430-439. 

5. Buros, Oscar K. (Ed.L Reading Tests and Reviews. Highland Park, New 

Jersey: Gryphon Press, 1968. 

6. Bufos, Oscar K. (Ed.). The Seventh Mental Measurements Yearbook. 

Hightand Park, New Jersey: Gryphon Press, 1972. 

7. Davis. Frederick B. "Fundamental Factors of Comprehension in Read- 

ing/' Psychometrika, 9 (September 1944), 185-197. 

8. Davis, Frederick B. "Research in Comprehension in Heading," Reading 

Research Quarterly, 3 (Summar 19681,499-545. 

9. Floyd, William D. "Do Teachers Talk Too Much?" Instructor, 78 (Octo- 

ber 1968), 53. 

fO. Hall, W. E., and F. P. Robinson. "An Analytical Approach to the Study 
of Reading Skills/' Journal of Educational Psychology, 36 (October 
1945), 429-442. 

Hunt, Lyman C, Jr. "Can We Measure Specific Factors Associated With 
Reading Comprehension?" Journal of Educational Research, 51 
(November 1957). 16M71. 

12. Kress, Roy A. "Classroom Diagnosis of Comprehension Abilities," Con- 

ference on Reading, University of Pittsburgh Report, 22 (1966), 
33-41. 

13, Laycock, Samuel R., and David H. Russell. "An Analysis of Thirty-Eight 

How-tc -Study Manuals," Sc/joo/ ffei^/eui/, 49 (May 1941), 370-379. 



ERLC 



BERG 



33 



14. Lennon, Roger. "What Can Be Measured?" Reading Teacher, 15 (March 

1962), 326-337. 

15. Pate, Robert T., and Neville H. Bremer. "Guiding Learning Through 

Skillful Questioning," Elementary School Journal, 67 {May 1967), 
417-422. 

16. Robinson, H. Alan. "A Note on the Evaluation' of College Remedial 

Reading Courses," Journal of Educational Psychology , A} (February 
1950), 83-96, 

17. Satlow, David. "120 Questions About Your Questioning Technique/' 

Business Education World, 49 (February 1969), 20-22. 
{8. Schreiner, Robert L., A. N. Hieronymus, and Robert Forsyth. "Differen 
tial Measurement of Reading Abilities at ihe Elementary School 
Level/' Reading Research Quarterly, 5 (Fall 1969), 84-99^ 

19. Smith, Nila Banton. American Reading instruction. Newark, Delaware: 

International Reading Association, 1965. 

20. Stolurovy/, L. M., and R. J. Newman. "A Factorial Analysis of Objective 

Features of Printed Language Presumably Related to Reading Diffi- 
culty," Journal of Educational Research, 52 (March 1959) 243-251. 

21. Thurndike, Edward L, "Reading as Reasoning; A Study of Mistakes in 

Paragraph Reading," Journal of Educational Psychology, 8 (June 
1917), 323-333. 

22. Thurstone, L. L, "Note on a Reanalysis of Davis' Reading Tests/' 

Psychometrika, 11 (September 1946), 185-188. 

23. Traxler, Arthur E. "A Study of the VanWagtnen-Dvorak Diagnostic 

Examination of Silent Reading Abilities," Educational Records Bul- 
letin, No. 31 (January 1941). New York: Educational Records 
Bureau, 33-41. 



34 



Reading Abilities 



Walter H. MacGinitie 

Teachers College 
Columbia University 



WHAT ARE WE TESTING?* 



Standardized reading achievement tests usually consist of at least 
two subtests— a vocabulary subtest and a comprehension subtest. 
Other subtests are also often included— for example, a test of read- 
ing speed. Or the vocabulary test may be subdivided into two 
different types of vocabulary tests, or the comprehension subtest 
may be divided into two or more different types of comprehen- 
sion tests. What is it that is being tested by these vocabulary and 
comprehension subtests and by the further breakdown of vocabu- 
lary or comprehension? 

The first point is that there is as much of a difference between 
different educational levels of the same subtest as there is between 
subtests with different names at the same level. The great changes 
that take place in arithmetic achievement tests from one grade to 
another are self-evident to most people. To score well on an arith- 
metic test for the sixth grade, a student must know a lot of things 
about decimals and fractions that have no bearing on performance 
on a test for the second grade. Most teachers and researchers are 
now also aware that what is measured by so-called intelligence 
tests changes considerably from the infant level to the interme- 
diate grades. In contrast, the rather large change in the content of 
reading tests from the first to the later grades is frequently not 
taken into account. Although most people readily see or already 
recognize the different requirements posed by reading tests at dif- 
ferent grade levels, they seem seldom to consider these differences 

•Adapted from a paper presented to the Thirtv-Fifth Annual Confer'^nce of the Educa- 
tional Records Bureau, New York, October 1970. 



MACGINITIE 



35 



when interpreting research findings or a child's educational status. 

Grade changes in reading tests are most obvious in the vocabu- 
lary subtest. The easiest items for the first grade usually use simple 
words, well known to all children in speech. The distractors, or 
wrong answers from which the children may choose, may all look 
and sound quite different from the right answer and be quite 
unrelated in meaning, in slightly harder questions, the distractors 
will present possible perceptual confusions, so that if the right 
answer is house, distractors might behorse or mouse. The vocabu- 
'ary questions gradually are made more difficult by jsing words 
that are less likely to be known as sight words or r'ords that 
include more difficult letter combinations. 

Eventually, ac the items get more difficult, the main difficulty 
for most children comes from uncertainty about the meaning of 
the words. The majority of the older children can puzzle out the 
pronunciation of most of the words whose meanings they know. 
They can even give a reasonable pronunciation to nonsense words. 
The test maker simply runs out of meaningful possibilities for 
making items more difficult by means of perceptual similarities 
alone. But we recognize that, for an older child, having a good 
reading vocabulary means more than just being able to pronounce 
words. The developing student learns new word meanings that a 
few years ago were not familiar to him in speech. Some of these 
new wot ds may even now be unfamiliar to him in speech, but their 
meaning is recognized in print. Thus, a reading vocabulary test for 
older children is more concerned with whether the child under- 
stands a variety of words that he may find in written material. 

This change occurs gradually in tests intended for increasingly 
more able readers. The title of the test remains the same ("reading 
vocabulary" or whatever the testmaker chooses to call it), but the 
ability that is tested appears to change quite radically. As repre- 
sented by the harder items in a third grade test, or by the majority 
of items on a fourth grade test, the reading vocabulary test has 
evolved into a test that is nearly indistinguishable from the vocab- 
ulary section of many group intelligence tests. Thus, the correla- 
tion between a reading vocabulary subtest at the fourth grade level 
and a verbal IQ test is likely to be as high as the correlation 
between the reading vocabulary subtest and a reaoing comprehen- 
sion subtest. 

Grade changes in reading comprehension tests roughly parallel 



36 



What Are We Testing? 



those described for reading vocabulary subtests, though they are 
perhaps less drastic and less otvious. In the primary grades, the 
comprehension tests are more cor.'^erned witli the straightforward 
interpretation of concrete stateme.ns and relationships, often 
those that are easily pictured. Sentences are simple, the number of 
items to be related is limited, and i'jem? to be related are not widely 
separated in the text. In later gr:jde?, greater stress is laid on infer- 
ences, on understanding complete ideas and difficult sentences, and 
on applying background knowledge. 

Since these grade changes in reading tests are so obvious— 
particularly in the case of the vocabulary subtest—why aren't they 
more prominent in our thinking about the meaning of reading test 
scores? I believe there are at least two reasons. We recognize the 
changes in the content of arithmetic tests partly because these 
changes reflect the formal introduction of specific topics in our 
teaching of arithmetic. We iiTtroduce long division or the addition 
of fractions as a specific topic of instruction. We don't expect the 
students to know much about these operations before they are 
formally taught and, after they are taught, we expect to see them 
featured in arithmetic achievement tests. Except for the so-called 
decoding stage of reading instruction, we don't have such clear-cut 
ideas about separate topics in reading instruction. This situation is 
natural enough, for beyond the decoding stage, advancement in 
reading depends so much on the child's developing language abili- 
ties that interact with almost all other instruction and experience. 
We do, of course, often try to teach specific skills, such as loc^.ting 
the main idea or understanding poetry. VA/e are relatively uncertain 
about how to teach such skills; they often seem to develop with- 
out specific instruction, and they are highly intercorrelated, 

A second reason that we are relatively unconcerned about 
grade changes in the content of reading tests is that the same 
children who learn the decoding skills readily also typically con- 
tinue to score well on later tests of richness of vocabulary or 
inference. There is considerable evidence of this stability of per- 
formance. For example, unpublished studies by Joseph Breen 
show correlations generally in the 70s between reading achieve- 
ment at the end of grade one or grade two and reading achieve- 
ment in the fourth or fifth grade (3), Now such stability could be 
taken as evidence that the tasks posed by reading tests really do 
not change very much tr jm first to fifth grade. I have off ered the 



MAC GINITIE 



37 



high correlation between Intermediate grade reading vocabulary 
tests and verbal aptitude tests as evidence that they are testing 
about the same thing. The difference in the two cases is partly the 
evidence of one's eyes. The reading vocabulary sections of a read- 
ing test and of a paper and pencil veibal aptitude test look alike. 
They were prepared following similar principles to test, in printed 
form, richness of vocabulary. On tha other hand, reading vocabu- 
lary and comprehension items for the early grades are built on 
different principles from those for later grades and the result is 
readily apparent in the items. 

There are other considerations to make one reject the high 
correlation between first and fifth grade reading scores as evidence 
that items designed to test decoding skills are actually testing the 
same ability as later items. One of those considerations is that 
some of the variance in scores at first and second grade level is 
based on items like those for higher cjrades. The harder items on 
second grade tests, at- least, are often constructed like those for 
higher grades, since the norms on such tests extend into the 
intermediate grade level. Again, this situation results partly from 
the fact that reading achievement for many children in the inter- 
mediate grades is not so dependent on specific school instruction 
as achievement in some other subjects. 

There is another consideration that argues against accepting 
the high correlation between beginning reading achievement and 
later reading achievement as evidence that the beginning and later 
items are measuring the same reading skills. This consideration is 
that first and" second grade reading achievement scores correlate 
remarkably highly with all kinds of iaier academic acfiievement, 
including arithmetic, not just with later reading achievement. In 
Breen's studies, correlations between first or second grade reading 
scores and fourth or fifth grade arithmetic scores were also in the 
70s, though somewhat lower than correlations with fourth and 
fifth grade reading scores. Correlations between first or second 
grade reading scores and composite scores on the Iowa Test of 
Basic Skills in fourth or fifth grade were in the 80s. The first or 
second grade reading items are clearly not arithmetic items. They 
are simply measuring something that is strongly related to later 
achievement. 

Why are early reading scores so highly related to later school 
achievement? Do teachers continue to favor children who are ini- 



38 



What Are We Testing? 



tiaily favored by them? Do scores on early reading tests influence 
teachers' expectations and lead to self-fulfilling prophesies? Do 
homes that provide support for early success in reading continue 
to provide good support and encouragement for other school 
achievement? Do children who have the capacity to iearn to read 
easily also have good capacity for other learning? Do children who 
are adaptable and malleable enough in the school environment to 
participate well in beginning reading instruction also participate 
well and thus learn more from latar instruction? Does the reading 
skift itself, and the knowledge gained through using it, contribute 
so much to school achievement in oiher subjects that growth in 
achievement is essentially determined by it? Probably all these 
things are true in varying degrees.. You can ur^doubtedly add other 
reasons to the list. My own belief is that, of the possibilities men- 
tioned, perhaps »:he most important is the continuing and reason- 
ably consistent influence of the home environment. There are 
great variatians in the degree to which the home provides a source 
of motivation and support, establishes habits of attention and co- 
operation, provides a background of useful skills and information, 
and, probably not least in importance, supplies actual instruction 
on school subjects. 

In any case, for v\, '^tever reasons, reading ability at the end of 
first or second grade is highly related to later achievement in read- 
ing and other subjects. Put another way, a child who has not 
learned to read by the end of the second grad^ * in deep trouble 
in most school systems; the child who does . ^earn to read in 
first or second grade finds that he has been planted in a child's 
garden of reverses. There are exceptions, of course, but most such 
children are in for a long career of frustration and failure. That 
there is a strong correlation between early success in reading and. 
later school achievement does not necessarily mean that pre- 
venting early reading failurr.-s would drastically reduce later school 
failures. The effects of a prevention program would depend on the 
reasons for the strong relationship between early reading achieve- 
ment and later school achievement. On the other hand, we do 
know that if nothing is done, those children who now do not learn 
to read in the first two years are very likely to be saddled with 
failure for the rest of their school careers. It is surely worth a 
try— worth an all-out effort to see that every child who doesn't 
mtike good progress in early reading has every incentive and every 



MAC GINITIE 



39 



opportunity to learn the skill. I am not suggesting that all children 
can achieve equally well, simply that the school should recognize 
what an extremely serious matter it is when a child doesn't learn 
to read in the first grade or two and that the school should do all 
that possibly can be done at that time rather than waiting until 
later. 

• So far, I have been illustrating the point that the nature of 
reading achievement tests changes markedly from the first grade to 
the intermediate grades. Now let us look at the other side cf the 
statement that introduced this point, namely that at a given educa- 
tional level there is not much difference between reading subtests 
with d'fferent names. Correlations between the vocabulary subtest 
and the comprehension subtest generally approach the reliability 
of the individual subtest. There is still room for the two subtests 
to be measuring somewhat di'rferent achievements, but for in- 
dividual pupils the difference between the vocabulary score and 
the comprehension score must generally be very large before we 
can put much faith in this difference actually reflecting a true 
difference in achievement in the two areas. The same statement 
applies with even greater force to attempted subdivisions of the 
vocabulary and comprehension tests. At the intermediate grade 
level and above, repeated studies of different types of formats of 
vocabulary testing emphasize that more or (ess the same achieve- 
ment is being measured by the different types of vocabulary tests. 
There is, indeed, some' difference, but the value of separate subtest 
scores for different types of vocabulary tests at the intermediate 
grade level and above seems questionable at this time. 

At the stage of beginning reading, however, there is probably 
room for more differentiation of the skills that are tested than has 
so far been incorporated into most tests. Any achievement test 
should, of course, be directly relevant to what is being taught in 
the school. At the present time, there is a considerable variation in 
the way beginning reading is taught. Most reading vocabulary tests 
fo^ the first two grades include a mix of items for measuring the 
outcomes of these different emphases. It is probably at these earli- 
est stages of reading instruction that criterion-referenced meas-. 
urement can be most meaningful and helpful at the present time in 
aosessing reading achievement. At advanced stages of achievement, 
criteria will be much harder to specify, and if we follow our intui- 
tions in setting them, we are likely to obscure rather I'nan clarify 



Aj 



What Are We Testing? 



the problem of the taxonomy of reading ability. Some criteria that 
will seem to make common sense will not help us understand what 
skills we need to teach. We need to continue to study this problem 
of the skills and abilities that comprise reading achievement. 

At the intermediate and higher levels, separation of different 
types of comprehension is about as difficult as separating differjni 
aspects of vocabulary achievement. The work of Davis (7, 2) clar- 
ifies the nature of this problem and indicates some of the poten- 
tials that exist. At the present time, the most promising distinction 
exclusive of vocabulary would appear to be that between under- 
standing facts explicitly stated in the reading passage, and making 
inferences from what is stated. Even this distinction is not an easy 
one, and we should require a clear demonstration (such as Davis 
has been attempting to provide), that two subtests are measuring 
+;iis distinction before we pay much attention to comprehension 
subtest scores that claim to represent different aspects of compre- 
hension ability. 

Let me now illustrate the significance of the changes in the 
nature of reading tests from first grade to the later grades by giving 
an example of how these changes might influence our understand- 
ing of test results, ll was noted earlier that the reading vocabulary 
subtest ended up in the intermediate grades being essentially like 
the vocabulary section of a group intelligence test. Some school 
systems have recently abandoned the use of so-called intelligence 
tests on the grounds that they lead to discrimination against pupils 
whose backgrounds have not. equipped them well for traditional 
school studies. When one evaluates the justification for this step, it 
becomes evident that the potential harm from, the intelligence test 
lay in its title and in the surplus meaning given to the scores, not 
in the information .it actually provided. It provided information 
about the student's current ability to learn academic subjects 
through reading, or listening, to expositions of academic material 
in standard English. The reading vocabulary test provides that kind 
of information, too. In fact, a reliable reading test is likely to 
predict later school achievement about as accurately as an IQ test. 
But look at the difference in attitudes toward these two test 
scores, A reading-vocabulary test is looked on as a measure of the 
school's accomplishment or the st^hooTs failure, whereas the 
vocabulary section of an intelligenc:e test yields a score that is 
someone else's responsibility. One way of indexing the difference 



MAC GiNlTlE 



41 



in attitude .vard the two types of tests is to note the difference 
in the temptation to coach students on the answers to the two. 
Coaching, and other fraudulent ways of making sure that the read- 
ing test scores of a class or school look good, has become a serious 
problem in some schools. Coaching is ordinarily not a problem for 
!Q tests given by the school/ A low IQ score is taken.as an indica- 
tion that the child will have difficulty in learning. It can even serve 
as an excuse. 

.The teacher may not realize that the reading vocabulary test is 
very like a section of the IQ test. But the teacher does know that a 
child who scores low on the reading test will have difficulty in 
learning at school, just as she knows that the child who scores low 
on an intelligence test is likely to have difficulty learning in 
school. The teacher will probably assume, however, that the diffi- 
culties have different sources and different remedies. She believes 
that the remedy for the low reading test score and for the di;fi- 
culties that it indexes is to teach the child to read. She is likely to 
see a low score on an 10 test as meaning that she can't teach the 
child to read. 

My purpose in raising these questions about the similarity be- 
tween reading vocabulary and.lQ vocabulary tests and about the 
difference in reaction to them is not to get the reading tests aban- 
doned too. The reading part of a reading test is the comprehension 
subtest, and surely we do want to know how well children are 
learning to read. Rather, I wish to point out that similar experi- 
ences^and similar background factors influence the scores on the 
reading vocabulary test and on the vocabulary section of the IQ 
test. 

In the past, we have tended to think of the intelligence test 
score as reflecting the child's pe?t and as indicating the extent to 
which the school will have trouble teaching him in the future. We 
have thought of the reading test score as reflecting the-school's 
work in the past and as indicating the extent to which the child 
will have trouble in the future. We will face more intelligently the 
tasks of teaching reading and will face with even greater determi- 
nation the whole job of education when we understand the func-. 
tions and problems of measurement well enough to realize that 
both scores reflect the child's past and what the school has done, 
and that, both scores suggest future needs and opportunities for 
both the child and the school. 



42 



What Are We Testing? 



References 

/. Davis, F. B. "Research in Comprehension in Reading," Reading Research 

Quarterly, 4 ( 1 968) , 499-545. 
2, Davis, F. B. "Psychometric Research on Comprehension in Reading," 

Redding Research Quarterly, 1 (1972), 628-678. 
5. Thorndike, R. L. "Reading as Hessonlng/' Reading Research Quarterly, in 

press. 



MACGINITIE 



43 



Ronald P. Carver. 



American Institutes 
for Research 



READING AS REASONING: 
IMPLICATIONS FOR MEASUREMENT* 



In 1917, Edward L. Thorndike [15, 16) presented the argument 
that reading was basically a reasoning process, and in 1971 Robert 
L Thorndike (7 7) presented factor analytic and correlational data 
which he interpreted as supporting this idea. Furthermore, R.L 
Thorndike argued that if we desire better readers, the challenge is 
to develop ways of teaching people to think rather than primarily 
concentrating on reading. He concluded that it is primarily meager 
intellectual processes that are limiting reading comprehension, not 
deficits In one or more specific and readily teachable skills. 

Before embarking upon another innovative teaching program 
designed to improve reading skills by teaching reasoning skills. It 
seems prudent to take a close critical look at the measurement 
techniques used to collect the data supporting the reading as rea- 
soning argument. It appears that the close relationship found be- 
tween reading and reasoning may be an artifact of the measures 
employed. The primary purpose of this article will be to analyze 
critically the relationship between reading and reasoning with the 
aim of illuminating the test and measurement problems involved. 

First, a background for discussion of the research by the two 
Thorndikes will be presented, and then the implications for pres- 
ent day reading tests will be discussed. Finally, suggestions for 
developing future reading tests will be presented. 

*The preparation of this paper was supported in part by the Personnel and Training 
Programs of the Office of Naval Research. Contract No. NOOO14-72-C-0240. 



44 



Reading as Reasoning 



BACKGROUND 

Before delving into the critical analysis of the Thorndike re- 
search, there needs to be established a frame of reference for what 
is meant by reading. An understanding of the reading process is 
crucial to any interpretation of the idea o(' reading as reasoning. 
An elaborate explication of the reading process seems to be justi- 
fied. The following quotation from Spache (/2) helps to convey 
the extensiveness of what is intended by references to the process 
of reading comprehension. 

The reader first recognizes words by their form, shape, struc- 
tural parts or by the implications of the context. Each word cnlis 
forth one or several meaning associations which the reader tries out 
for appropriateness in this contextual setting. He accepts whrit seems 
to be the most relevant meaning or associative thought and proceeds 
to the next vyord, again choosing an association which seems logi- 
cally related to the preceding word. Various groups of words form 
cohesive associations as he reads through the elements of the sen- 
tence. These groups of ideas or details coalesce into the stated or 
implied meaning of the sentence. The meanings of successive sen- 
tences may be combined inductiv/ely into the mail', idea oi the para- 
craph. In deriving the main ideas of the paragraph, the reader may 
recognize cause-effect, quescion-answer, hypothesis-proof or other 
relationships which contribute to the generalization. Or these sen- 
tence meanings may form the basis of original deciuutions, such as 
implications or unstated conclusions, or ideas associated with b.it 
tangential to thd main Idea of the paragraph. 

The reader may go far beyond simple comprehension of the 
literal, im,p(ied or tangential meanings to evaluation of the ideas 
offered. He may question their authenticity, deny their implications, 
or reject the bias or prejudice present. He may be moved to consult 
other sources for verification, to. check the author's background, to 
compare the author's value judgments. Finally, the reader may 
utilize the author's ideas or viewpoint in a creative treatment of the 
same topic basing his own ideas upon those he has read, or refuting 
them by proper logic or proof. 

It s.eems convenient to ferret out four separate levels of 
Spache's total description of the reading comprehension process. 
Level 1 is associated with the words as units and involves both the 
decoding of words and the determination of their meaning as used 
in the particular sentence being read. Level 2 is associated with 
sentences as units and involves the combination of the meanings of 
the individual words into the complete understanding of the sen- 



CARVER 



45 



tence. Level 3 is associated with the paragraph as a unit and may 
involve the recognition of the implied main idea of the paragraph. 
Level 4 is associated with no particular unit and may involve 
thinking activities which are not at all associated with the literal, 
implied, or tangential meanings of the prose. It is important to 
note that Level 4 would seem to involve a great deal of what is 
normally regarded as reasoning. In fact. Level 4 would not seem to 
be a part of the ongoing process of reading at all. It seems to 
involve activities that probably could not occur at the same time 
as Levels 1 and 2 were occurring. Level 4 is probably best regarded 
as not being part of \hQ reading process at all, although Spache and 
others may want to include it as part of what they regard as the 
total reading comprehension process. Level 3 also seems to involve 
a great deal of reasoning. The recognition of main ideas and cause- 
effect, question-answer, hypothesis-proof relationships would 
seem primarily to involve basic intellectual processes that may or 
may not be functioning at the same time as Levels 1 and 2. In any 
event, because of the inherent nature of the activities that take 
place during Level 3, it could be assumed that a primary intellec- 
tual functioning, called reasoning, would have to be involved. 
Levels 1 and 2 seem to capture the essence of the ongoing reading 
process. And it is not at all easy to infer the extent to which a 
basic intellectual process such as reasoning is involved in the 
execution of Levels 1 and 2. 

It appears that the functioning of each of the levels noted 
above would depend upon the functioning of every lower level. 
And the higher the level, the more obvious is the functioning of 
reasoning. If reading is taken to include Levels 1 , 2, 3, and 4, then 
it is easy to understand how reading could be regarded as involving 
a great deal of reasoning because of the way Levels 3 and 4 have 
been defined. What is crucial is the relationship between reasoning 
and what happens in Levels 1 and 2, the essence of the reading 
process. When the reading process breaks down, then it becomes 
crucial to know why. Was it because the level of reasoning ability 
was not high enough to match the level required by the material? 
Or was it because the individual had simply not yet learned how to 
recognize the words and determine their meaning within the sen- 
tence? If we understand the relationship between reasoning and 
reading (Levels 1 and 2), then we shall be in a much better posi- 
tion to diagnose and prescribe when confronted with a dysfunc- 



46 



Reading as Reasoning 



tioning reader. 

With this background, we are now ready to examine the re- 
search of the Thorndikes ( /5, 16, 17). 

THORIMDIKE, 1917 

E. L.Thorndike seemed io be interested in Levels 1 and 2 of 
reading. He seemed to want to learn more about the processes 
involved in reading while it was occurring, i.e., the ". . . dynamics 
whereby a series of words whose meaning is known singly pro- 
duces knowledge of the meaning of a sentence or paragraph." 
Thorndike speculated in great detail about the thoughts which 
accompany the words while they are being read by a poor reader 
as compared to an expert reader. Although Thorndike had a pri- 
mary interest in Levels 1 and 2 of reading, his research did not 
seem to concentrate on Levels 1 and 2 as they are executed in 
normal reading situations. As Tuinman (78) has noted, Thorndike 
was highly interested in showing that reading fits into the pre- 
vailing stimulus-response psychology of the time. He was inter- 
ested" in demonstrating that reading could involve the higher intel- 
lectual processes such as reasoning just as much as does mathe- 
matical calculation. Thus, we find Thorndike using extremely dif- 
ficult reading material [13). It should not" go unnoticed that 
Thorndike, in this research, did not use paragraphs that had been 
taken from existing school reading materials. He devised his own 
paragraphs and because of this and because of the nature of the 
paragraphs, it is questionable whether one should agree that they 
were representative of the ordinary reading done by his subjects, 
as he contended (14). The paragraphs indeed seem like exercises in 
logic, and when one examines them it is not difficult to under- 
stand how he could conclude that reading involved reasoning. 

Besides the passages chosen by Thorndike, there are other 
reasons why his research does not seem to provide evidence rele- 
vant to the relationship between reasoning and Levels 1 and 2 of 
reading. Thomdike's research task involved passages and questions 
on the passages. There is a problem in making valid inferences 
about t!ie ongoing reading process from the answering of certain 
questions presented subsequent to the reading .itself. If an individ- 
ual does not answer the question correctly, it might be because of 
a deficit in the reasoning process that occurred during reading (as 
Thorndike seems to infer), it might be because of a deficit in the 



CARVER 



47 



reasoning process that occurred in connection with an attempt to 
answer the question itself, or it might be because a failure to 
execute Level 1 of the reading process made it iinpossible to 
answer a Level 3 or Level 4 type question. Thorndike attempted 
to deal with this problem by considering the passage and the ques- 
tions as a single unit so that he could increase difficulty or degree 
of understanding either by replacing paragraphs or rewording ques- 
tions ( 75). Since questions are not an inherent appendage to read- 
ing, it seems prudent to conclude that Thorndike was not using a 
research. paradigm which allowed optimum generalization to the 
reading process. 

There is direct evidence thcjl E. L. Thorndike used Level 3 
type questions in his research. For example, the very first question 
he presents in support of his argument is; 'What is the general 
topic of the paragraph?" Another example of the type of ques- 
tions he asked pupils in Grade 6 is: "What condition in a pupil 
would justify his nonattendance?" 

It also appears that Thorndike did not adequately control for 
failures in Level 1 reading, the part that would seern least to 
involve reasoning. These same failures could also involve what is 
known os decoding problems today. Thorndike {16) was aware of 
this problem as the following quotation demonstrates: 

In general, the material used here will be paragraph and ques- 
tions^ whose words singly are fairly well known to the pupils in 
question,, but whose sentence structure is somewhat more elaborate 
than p'uplls of the grade in question can manage. That is, the study is 
primarily concerned with the ability of the pupil to understand 
totals'^ few of whose elements are unknown, but whose internal rela- 
tions are somewhat intricate and subtle. 

This plan seems reasonable, but consider the following words 
Thorndike used in the paragraph and questions he administered to 
sixth graders in 1917: session, contagious, impassable, compul- 
sor/, and excusable. There is no direct evidence that 'all or even 
most of these sixth graders knew these words (i.e., could execute 
Level 1). 

In summary, Thorndike seemed to be interested in ordinary 
reading (i.e., the functioning of Levels 1 and 2), but his research 
seemed to: 1) involve unordinary reading materials; 2) include 
questions that were definitely reasoning type questions (i.e., Level 



48 



Reading as Reasoning 



3 reading type questions); and 3) inadequately control for Level 1 
dysfunctions accounting for Levels 2, 3, and 4 failures. Because of 
these aspects of his research, it seems reasonable to conclude that 
Thorndike's research contributed little or nolhing to our knowl- 
edge of the relationship between reasoning and the primary 
aspects of the reading process, i.e.. Levels 1 and 2. It seems easy to 
agree that reading involves high degrees of reasoning when reading 
is taken to include Levels 3 and 4, but this is not a relationship— it 
is a definition. 

THORNDIKE, 1971 

R. L. Thorndike presented a paper entitled "Reading as Rea- 
soning," upon receipt of the Edward L. Thorndike Award at the 
1971 meeting of the American Psychological Association (77). 
Included in the various data analyses of R. L. Thorndike was a 
factor analysis of the data presented by Davis (7). Thorndike con- 
cluded that oneiar.tor, reasoning, could be interpreted as account- 
ing for the predominant portion of the variance. This analysis and 
conclusion is in agreement with that of Carver (3), who reported 
that the intercorrelations among the Davis variables were above 
.90 when the reliability coefficients were used to correct for atten- 
uation. 

R, L. Thorndike has presented a great deal of data which 
convincingly shows that the ability to answer questions on existing 
standardized reading tests is so highly correlated with achievement 
tests and intelligence tests that it is reasonable to conclude that 
they are measuring the same thing. Yet, the high correlations 
among reading and intelligence tests seem best to be regarded as 
artifa^tual and as having little or nothing to do with the nature. of 
reading. The reasons for this involve the way the tests are made. 

The intelligence Lb*sts in Thorndike's research were group test:. 
and all, therefore, required reading. Thus, these measures of basic 
intellectual functioning were all contaminated to an unknown but 
presumably high degree by variations 'n rea(iing ability. 

More importantly, the standardized r':jading tests used by R. L, 
Thorndike were all contaminated to an unknown but presumably 
high degree, by intelligence (reasoning-tvpe) questions. Farr (9) 
has recently discussed and* given examples of how reading tests 
today bear a. sijong resem^blance to group verbal intelligence tests. 



CARVER 



49 



One of Farr's examples taken from a well-known reading achieve- 
ment test is given below: 

The sheep were playing in the woods and eating grass. The 
wolf came to the woods. 

Then the sheep 

1 . went on eating. 

2. ran to the barn. 

3. ran to the wolf. 

The reading tests of today make no effort to discriminate 
between questions "relevant to Levels 3 and 4 and questions rele- 
vant to Levels 1 aiid 2 when they select and score test items for 
passages. Since traditional item selection techniques involve the 
selection of the items which best discriminate among individuals, 
and since intelligence or reasoning-type items tend to be the best 
in this regard, it is not surprising that standardized reading tests 
have evolved into standardized verbal intelligence tests. It is unfor- 
tunate but true that E. L.Thorndike's research has provided the 
justification for permitting reading tests to become reasoning tests. 
If questions that are clearly reasoning-type questions make up 
much, if not most of present day standardized reading tests, then 
it should not be surprising to find that/reading test scores are 
reasoning scores. Yet, we shall not learn much about the reading 
process, Levels 1 and 2, by employing a task that reflects directly 
upon Level 3 and Level 4 reading, which are already known to 
involve high degrees of reasoning. 

Up to this point it has been argued that it is not surprising that 
if reading is measured by passages and questions that obviously 
require reasoning, then readirtg is bound to appear to be reasoning. 
Yet, informally, colleagues have rebutted that even questions that 
do not seem to be, reasoning-type questions correlate highly with 
those that are. "This purported, inconsistency also may be an arti- 
fact of the -way 'tests are developed. For example, consider the 
Davis (7) study which had a variable called "finding answers to 
questions answered explicitly or merely in paraphrase in the con- 
tent/' This variable would appear to be an indicant of reading. 
Level 2. And this variable seems to be just as much a reasoning- 
type variable as the others which were more obviously reasoning- 
type variables. Yet, there is a major problem involved in the inter- 
pretation of this inconsistent result. Not only does one have to 
make sure the question is not of a reasoning-type, but one also has . 



50 



Reading as Reasoning 



to make sure that the alternative wrong answers provided on a 
multiple-choice test do not inadvertently shunt the item off into a 
reasoning-type question. 

The above mentioned problem is a serious one, given existing 
item writing and selecting techniques. If an alternative wrong 
answer does not draw any responses (i.e., it is a poor distractor), it 
is usually rewritten so that it becomes more credible, i.e., more 
people choose it. Thus, a ^est question may appear to require little 
or no reasoning, but when "good" alternative wrong answers are 
provided, then the test may be automatically changed so that it 
requires varying degrees of reasoning not obvious from the ques- 
tion itself. The degree to which this is true in the Davis research is 
not known. Yet, this is an inherent problem involved in all multi- 
ple-choice tests, and research which was not designed to control 
for this artifact should be interpreted with caution m regard to the 
relationship between reading and reasoning. 

R. L.Thorndike concluded from his results that reading was 
fundamentally reasoning, and he further suggested that improve- 
ment in reading may only occur after instruction in reasoning. 
Yet, it seems more reasonable to interpret R. L.Thorndike's results 
as supporting an alternative hypothesis. Reading is not primarily 
reasoning, but most standardized reading tests are actually stan- 
dardized reason/ng tests. The high correlation between reading tests 
and intelligence tusts reported by E. L.Thorndike is still true 
today. For example, most of the correlations between the STEP 
Reading Test and the SCAT Test (an intelligence test) are reported 
to be above .80, according to the manual for the test. But, it seems 
more reasonable to interpret these high correlations as artifacts of 
the way the tests are developed rather than supporting the idea 
that reading is primarily reasoning. 

A CRITIQUE OF STANDARDIZED READING TESTS ; 

E. L. Thorndike's technique of presenting paragraphs with 
questions beside them has influenced standardized testing of read- 
ing achievement to the present day, as evidenced by the tests used 
in R. L.Thorndike's research. What has changed through the past 
fifty years is the addition of multiple- ^hoice answers and highly 
sophisticated ways of revising, scoring, analyzing, and reporting 
test results.' It does not matter to most psychometricians how 



CARVER 



51 



good the test questions are for measuring progress in the under- 
standing of the sentences that occurred during the reading of a 
paragraph; i.e., as long as the questions have face validity, discrimi- 
nate reliability among individuals at any given level, and demon- 
strate level-to-level group mean increments. It was fortunate for 
psychonietricians that E. L.Thorndike focused upon the reasoning 
aspects of reading, since it is quite easy to develop tests that 
satisfy the preceding psychometric criteria using reasoning-type 
questions. For school-age individuals, intelligence-type questions 
always produce large individual differences and show maturational 
increases from year to year. If E. L.Thorndike had focused upon 
questions that did not produce large individual differences, 
psychometricians would have had large problems adapting their 
sophisticated statistical techniques to the development of reading 
tests. 

To illuminate this undesirable influence of E. L.Thorndike 
upon present day measures of progress "n reading, a hypothetical 
situation will be presented. Suppose a paragraph is selected' from a 
sixth grade reading book, and questions are constructued which 
are designed to ascertain whether a student has read and under- 
stood the complete thoughts (i.e., sentences) that the writer in- 
tended to communicate (reading. Level 1 and Level 2). Suppose 
most of the sixth graders can get all of these questions correct; i.e.. 
the variability in the group approaches zero. In this situation, all 
traditional estimators of reliability c.nd validity will also approach 
zero (4). The'traditional psychometrician will throw up his hands 
in horror in this situation. Yet, this type of test situation may be 
the best way to measure progress in reading. (Empirical data have 
been presented to support this measurement method; see 6.) If a 
test on a-.paragraph measures levels of progress, then the variability 
among individuals may not be found primarily on the test but in 
the time or amount of instruction required for the individual to 
reach this level of mastery (2). Yet, variability in time is anathema 
to the traditional calculation of test percentiles and reliability 
estimates. 

Unfortunately, E. L.Thorndike and R. L.Thorndike continue 
to influence psychometricians to be unconcerned about the repre- 
sentativeness of their questions for indicating progress in reading 
(Level 1 and Level 2). Consider, for example, the following cate- 
gories of items given in the manual for the STEP Reading Test: 



52 



Reading as Reasoning 



reproduce ideas, translate ideas and make inferences, analyze 
motivation, analyze presentation, and criticize. The understanding 
that occurs as a result of the execution of Level 1 and Level 2 \^^ay 
be more difficult to measure than the reasoning that occurs during 
Level 3. and Level 4, but that does not seem to be justification for 
allowing reading tests primarily to m^jsure important but ancillary 
aspects of the reading process. It would seem to be a better strat- 
egy to let intelligence tests measure reasoning and reading tests 
measure reading, Level 1 and Level 2. Why be inefficient "and 
duplicate thfii^e measures even if some reading specialists do con- 
sider reading comprehension to include the verbal reasoning in- 
volved in Levels 3 anr r? At present, we are in the embarrassing 
position of assigning certain grade level of reading achievement 
(e.g.. Grade Level 3. .) to a chance score on a test and not really 
knowing if the student can read (Levels 1 and 2) ot all. 0;, the 
chance score may indicate that the questions ( Levels 3 and 4) were 
so irrelevant to the reading process that the student understood 
the material that he read but could not sufficiently infer what the 
answers to the questions were. 

It is haunting to consider that today's standardized reading 
tests are probably measuring reasoning progress more than reading 
progress. Consider a pair of hypothetical twins: Twin-A receives 
no instruction in reading for an entire year, and Twin B receives 
normal reading instruction. Twin A no doub!' matured one year in 
reasoning ability whether he received instruction in reading or not. 
Twin A probably will show as muqh, or almosv as much, progress 
on most standardized reading tests as Twin B. Thus, a school 
which has a poor instructional program will probably demonstrate 
about as much gain in a year's time a^ a school which has an 
excellent program, a haunting thought?"* 

It is especially frightening to find that the U.S. Office of Edu- 
cation contracted with Educational Testing Service for a National 
Anchor Test Equating Study in Reading (10). This contract was 
for $698,000.* The results of this study most likely will be a 

*This sum of money is approximately equal lo the total amount of funds to be ex- 
pended during the entire 1972 fiscal year in the Basic Research Program of the U.S. 
Office of Education. Reading researchers should be interested in the fact that IJSOE gave 
psychometricians $698,000 to make the centiles of seven norm-referenced tests more 
comparable, while USOE earmarked no federal funds for basic research in reading. USOE 
did earmark funds for basic research in economics and anthropology, but the USOE 
initiated Targeted Research and Development Program In Reading received no USOE 
funds. 



CARVER 



53 



highly accurate, norm-referenced intelligence test system under 
the guise of a reading test. After this study, it will seem logical 
that school systems will be forced by USOE to evaluate their 
federally funded innovative readii-g programs with one of the 
sevei most popular tests that have been nationally equated. We 
have already witnessed ? deprecation of preschool programs be- 
cause they did not raise IQ scores; even though no one should 
expect IQ scores to be raised (//). Should we now expect to see 
innovative reading programs bite the dusv when they are eventu- 
ally subjected to psychometrically sound tests that primarily meas- 
ure progress in reasoning? It seems rationally sound to expect 
school systems to help students in their learning to read more 
difficult material, but it seems rationally unsound to expect good 
reading instruction to have much effect upon a student's funda- 
mental ability to reason. 

THE FUTURE OF STANDARDIZED READING TESTS 

It is known that the ability of almost every student to think or 
reason (e.g., mental age) increases each year throughout school 
age, and it is known that reading skill increases normally for some 
and does not increase normally for others. It is not known 
whether certain levels of basic intellectual ability are required. for 
certain levels of reading achievement. It may or may not be realis- 
tic to expect that' almost all 10-year-olds can" achieve a certain 
minimum level of adult reading skill given ample time and suffi- 
cient help (8). 

Bloom (7) contended that there are certain hurdles in school 
that should be overcome before an individual is subjected to subse- 
quant highei level instructional treatment. Otherwise, the student 
ma/ never progress normally. One of the challenges in reading is to 
measure the achievement of these hurdles. Another challenge, is to 
determine how much time an individual needs to achieve a hurdle, 
given a certain developmental' level of reasoning. Just because a 
student is low in reasoning ability in relation to his same-age peers 
does not mean that his reasoning ability will not improve as he 
grows older, or that he should not be expected to attain a certain 
level of reading ability to match each level of his reasoning ability 
as it matures. 

What appears to be needed are; 1) tests that actually measure 



54 



Reading as Reasoning 



progress levels in the ability to read, i.e., edumetric^ or criterion- 
reffirenced type tests that focus upon the ability to read and 
understand reading material of increasing difficulty instead of 
psychometric or norm-referenced reasoning-type reading tests; and 
2) tests of edumetric or criterion-referenced levels of reasoning 
ability instead of psychometric norm-referenced reasoning tests. 
Then, answers to the following theoretical and practical questions 
about the relationship between reading and reasoning could be 
empirically determined. 

1. Does the rate of growth in reading match the rate of 
growth in reasoning? 

2. Does the level of reasoning ability always determine the 
highest level of reading ability? 

3. Can the level of reasoning ability be usad to set the expec- 
tation level of reading status and thereby be used to evalu- 
ate the progress of the students and the "goodness" of the 
school system's instructional program? 

E. L. Thorndike in 1917, and R. L. Thorndike.in 1971, have 
focused upon an ;rr>portant problem, but the past, present, and 
future ill-effects of this focus should not be overlooked. What is 
needed at this time is more attention directed toward the measure- 
ment of absolute levels of the ability to read sentences that make 
up paragraphs, not the ability to answer reasoning-type questions 
on paragraphs.- What is needed is an investigation of the relation- 
ship between absolute levels of reading and absolute levels of rea- 
^joning. Hopefully, the next fifty years will not find reading re- 
searchers in the same embarrassing situation .of concluding from 
reading test data that the ability to answer reasoning-type ques- 
tions on paragraphs mainly involves the ability to reason. 



*The edumetric approach to testing refers to the focus upon measurmg progressive 
within-individuat gains of high relevance to education as contrasted with the traditional 
psychometric approach wh'nh tends to focus upon the static between-individual differ- 
ences of high relevance to psychology (5), 



CARVER 



55 



References 

/. Bloom, Benjamia "Individual Differences in School Achievement: A 
Vanishing Point?" .fjhi Delta Kappa Address at the meeting of the 
American Educational Research Association, New York, 1971. 

2. Carroll, John B. "A Model of School Learning," Teachers College 

Record, 64 (Uny 1963), 723-733. 

3. Carver, Ronald P. ."Analysis of 'Chunked' Test Items as Measures of 

Reading and Listening Comprehension," Journal of Educational Meas- 
urement, 1 (Fall 1970), 141-150. 

4. Carver, Ronald P. "Special Problems in Measuring Change with Psycho- 

metric Devices," Proceedings of the A.I.R. Seminar on Evaluative 
Research, Strategies and Methods. Pittsburgh: American Institutes for 
Research, 1970. 

5. Carver, Ronald P. "Reading Tests in 1970 versus 1980: Psychometric 

versus Edumetric," Reading Teacher, 26 (December 1972), 299-302. 

6. Carver, Ronald P. "Measuring the Relationship between Reading Input 

and Understanding," unpublished manuscript, 1973. 
Davis, Frederick B. "Research in Comprehension In Reading," Reading 
Research Quarterly, 3 (Summer 1968), 499-545. 

8, Ellson, Douglas G. "A Critique of the Targeted Research and Develop- 

ment Program on Reading," Reading Research Quarterly, 5 (Summer 
1970), 524-533. 

9. Farr, Roger.. "Measuring Reading Comprehension: An Historical Perspec- 

tive," in F.P. Green (Ed.), Twentieth Yearbook of the National Read- 
ing Conference. Milwaukee: National Redding Conference, 1971, 
187-197. 

10. Jaeger, Richard M. "A National Test Equating Study in Reading," paper 
presented at the meeting of the Psychometric Society, St. Louis, April 
1971. 

Jenson, Arthur R. "How Much Can We Boost IQ and Scholastic Achieve" 
ment?" Harvard Educational Review, 39 (Winter 1969), M23. 

12. Spache, George D. Toward Better Reading. Champaign, Illinois: Garrard, 

1S63,65. ' 

13. Stauffer, Russell G. "Thorrdike's 'Reading as Reasoning': A Perspec- 

tive," Reading Research Quarterly, 6 (Summer 1971 ), 443-448. 

14. Thorndike, Edward L. "An Improved Scale for Measuring Ability in 

Read'w^.q," Teachers College Record, 16 (November 1915), 31-53. 

15. Thorndike, Edward L. "Reading as Reasoning: AStudy cf Mistakes in 

Paragraph Reading," Journal of Educational Psychology, 8 (June 
1917), 323-332. 

16. Thorndike, Edward L. "The Understanding of Sentences: A Study of 

Errors in Reading," Elementary School Journal, 8 (October 1917), 
98-114. 

;7. Thorndike, Robert L. "Reading as Reasoning," Reading Research Quar- 
terly; in press. 

18. Tuinman, J. Jaap. "Thorndike Revisited-Some Facts," Reading 
Research Quarterly, 1 (Fall 1971), 195-202. 



56 



Reading as Reasoning 



Robert L. Thorndike 

Teachers College 
Columbia University 



DILEMMAS IN DIAGNOSIS 



One of the common uses of psychometriv': devices in. the field of 
reading— as in education generally— iS' for educationcl diagnosis. 
Diagnosis is most often a matter that relates to a specific individ- 
ual, though we may be from time to time interested in making 
diagnostic judgments about groups. Diagnostic judgments are 
often based on the comparison of two, measures in order to judge 
whether the individual shows' some genuine discrepancy in the 
traits or characteristics that the two measures represent. Thus, if a 
child falls at the 50th percentile on a test of word knowledge but ■ 
only at the 25th percentile on a test of comprehension of con- 
nected prose, the diagnostician must decide how much confidence 
to place in the conclusion that this child's ability to read con- 
nected prose falls short of his knowledge of word meanings. The 
whole armamentarium of diagnostic devices in the field of reading 
has its value in suggestir^g judgments of the type "Ability A is 
greater than Ability B." 

But differential judgments about individuajs are'slippery cus- 
tomers. They are peculiariy subject to measurement error. Some 
45 years ago, Kelley (2) warned of the need for especially reliable 
tests if such diagnostic judgments were to be made with confi- 
dence. Nothing that has developed since then has given occasion 
for the psycho metrician to change his views ^on this point. The 
diagnostician, however, cannot wait for the psychometrician to 
produce the perfect psychometric instrument in order to deal with 
the practical problems of his day to day functipning. He must get 
on with the job. And practical limitations of time and resources 



THORNDIKE 



57 



for carrying out his assessments mean that he will always have to 
use tools that fall short of psychometric ideals. 

This being so, what help can psychometrics offer the diagnosti- 
cian to "carry on" while he waits for the perfect diagnostic bat- 
tery? Perhaps some guidance on the level of confidence that he 
should place in diagnostic judgments might be useful to tide him 
over. 

We must always remember that any test, or any other type of 
behavior-^Qbsefvatjon, represents only a limited sample from some 
domain of behavior. It represents the domain only imperfectly, 
and the score that it produces is only an approximation to the 
score that the individual would get for the whole domain— or more 
realistically, that he would get on other samples drawn from that 
domain. We get evidence on this variability from sample to sample 
of behavior through the various procedures for obtaining a reliabil- 
ity coefficient, and we express it most usefully for our present 
purposes as a standard error of measurement. The standard error 
of measurement may be thought of as the standard deviation of a 
series of equivalent measures of the same individual, displaying the 
extent to which the measures scatter away from his "true score." 

Suppose, now, we have two measures, X and Y. For concrete- 
ness let us say that X is a measure of vyord knowledge and Y.a 
measure of paragraph comprehension. Suppose that results from 
the two measures are expressed in a common- equal-unit score 
scale, such as T-scores or stanines for a common sample of sixth 
grade pupils. Suppose that Peter differs on the two tests by an 
amount Z?, and for concreteness let us say that this difference is 10 
points on the T-score scale or 2 points on the stanine scale; i.e., a 
difference of exactly one standard deviation. How much confi- 
dence should we have that this difference represents something 
real, and didn't just happen because of errors of measurement in 
the two tests? How confidently can we expect a difference in the 
same direction, though obviously not of identically the sarne - 
arnount, if Peter is retested with equivalent forms of each of the 
two tests? 

In setting our level of confidence, we need to take account of 
three things, two of which have already been mentioned. In the 
first place, we need to take account of the size of the standard 
errors of measurement for the two variables. The larger the 
errors— that is, the lower the reliability— the lower the confidence. 



58 



Dilemmas in Diagnosis 



The appropriate degree of confidence depends secondly upon the 
size of the observed difference between the two scores. The larger 
the difference, the greater the level of confidence. !t depends 
finally, and quite critically, on the correlation between the meas- 
ures, X and Y, of the two attributes that we are studying. The 
higher that correlation, the other two factors remaining ihe same, 
the less confidence one can have in the meaningfulness of the 
difference. 

Let us look at the rationale for these relationships with spe- 
cific figures for a definite example. Suppose that the word knowl- 
edge test (X) and the paragraph reading test (Y) are each known to 
have reliability coefficients of .90 for a sixth grade sample and 
that for the same sample the correlation between the two tests is 
0.80. Consider Peter, who scored one standard deviation lower 
(relative to the standardization group) on the paragraph test than 
on the word test. , ■ 

For a single test with reliability of 0.90, the standard error of 
measurement, expressed in standard deviation units, is: 



\/l - r,, = \/l - 0.90 - (1) 

For the difference between two tests, both expressed in standard - 
deviation units, the standard deviation of differences arising purely 
from measurement errors, which we might call the standard error 
of measurement of difference, is: 



sj2 - f j^x' - ^yy' = \/2 - .90 ~ .90 = v/a20 = 0.45. (2) 

Thus, a difference between scores of one standard deviation i^ 
equal to . * 

TOO _ 2 



0.45 



standard errors of tneasuriernent of the difference. Turning to 
tables of the normal curve, we'^find that a difference xhis large cr 
larger could be expected to occur in 13 cases out of 1000. 

A parallel formula gives the standard deviation of differences 
between two tests when one knows what the correlation between 
the two tests is. When, as before, each test's scores are expressed in 
standard deviation units, the formula for standard deviation of. 
differences is: ' 

\/2 - 2rxy = V2- 2(0,80) - s/a6. = 0.63. (3) 



THO^' OfKE 



59 



Thus, a difference between scores of one standard deviation is 
equal to 

.1,00 



0,63 



= 1.59 



standard deviations of the differences between these two quite 
highly correlated variables. Turning once again to our table of the 
normal curve, we find that, given a correlation of .this size, differ- 
ences this ia,^ge will occur in 56 of 1000 cases. Of these 56, on the 
basis of our earlier calculation, we should expect that 1 3 were the 
result of nothing more than measurement error. This leaves 43 
that represent presumably "rea!" differences. Thus, we^'may say 
that the odds are 43 to 13 or about 3 to 1 that the difference is a 
genuine one. The betting odds of 3 to 1 represent one way of 
expressing the confidence that we should feel in the diagnostic 
judgment that Peter is better at word knowledge than at paragraph 
reading. 

Following the same rationale that we have used in our illustra- 
tion, it is possible to prepare tables showing the "betting odds" for 
representative cornbinations of reliability, intercorrelation, and 
size of difference. An illustrative set of such tables is presented in 
Table 1, v 



TABLE 1, 

Confidence Tables for Diagnostic Judgments: Odds that an Observed Differ- 
ence between Two Variables Is a Real Difference. 

Section 1: Average Reliability = 0.98 



Correlation Between Variables 



Difference in 








S D Units 


.95 


.90 


.85 .80 .75 .70 .60 .50 


0.25 


1:1 


2:1 


2:1 5:2 5:2 5:2 3:1 3:1 


0,50 


9:1 


20:1 




0.75 








1.00 








1.25 








1.50 






AI) others greater than 20 to 1 


1.75 








2.00 









.40 .00 

3:1 3:1 



60 



Dilemmas in Diagnosis 



Section II: Average Reliability = 0.95 



Correlation Between Variables 



Difference in 










S D Units 


.9C 


,85 


.80 .75 .70 .65 .60 .5C 


.40 


0.25 


1:3 


1:2 


3:5 2:3 3:4 3:4 4:5 5:6 


7:8 


0.50 


5:4 


2:1 


5:2 3:1 7:2 7:2 4:1 4:1 


9:2 


0.75 


7:2 


8:1 


11:1 14:1 16:1 iy;1 20:1 




1.00 


13:1 








1.25 










1.50 










1.75 






AH others greater than 20 to 1 




2.00 











Section 111: Average Reliability = 0.90 



Correlation Between Variables 



Difference in 




















S D Units 


.85 


.80 


.75 


.70 


.65 


.60 


.55 .50 


.40 


.00 


0.25 


1:7 


1:5 


1:4 


2:7 


1:3 


1:3 


2:5 2:5 


2:5 


1:2 


0.50 


1:3 


1:2 


3:4 


1:1 


1:1 


7:6 


5:4 4:3 


3:2 


7:4 


0.75 


4:5 


3:2 


2:1 


5:2 


3:1 


3:1 


'7:2 4:1 


4:1 


5:1 


1.00 


3:2 


3:1 


5:1 


7:1 


8:1 


9:1 


10:1 11:1 


13:1 


17:1 


1.25 


3:1 


7:1 


12:1 


18:1 












1.50 


7:1 


20:1 
















1.75 




















2.00 






All others greater than 20 to 1 







Section IV: Average Reliability = 0.85 



Correlation Between Variables 



Difference in 
















S D Units 


.80 


.75 


.70 


.65 


.60 .55 .50 


.45 .40 


.00 


0.25 


1:18 


1:9 


1:7 


1:6 


1:5 2:9 2:9 


1:4 1:4 


1:3 


0.50 


1:6 


1:3 


2:5 


1:2 


1:2 2:3 2:3 


3:4 4:5 


1:1 


0.75 


1:3 


2:3 


1:1 


8:7 


4:3 3:2 5:3 


7:4 13:7 


5:2 


1.00 


2:3 


4:3 


2:1 


5:2 


3:1 10:3 11:3 


4:1 4:1 


6:1 


1.25 


1:1 


5:2 


4:1 


5:1 


6:1 7:1 8:1 


•9:1 10:1 


15:1 


1.50 


5:3 


4:1 


8:1 


10:1 


14:1 17:1 20:1 






1.75 


9:2 


13:1 












2.00 


•7:1 






All others greater than 20 to 1 





TH( 'E 



61 



Section V: Average Reliability = 0.80 



.Correlation Between Variables 



Difference in 
























S D Units 


.75 


.70 


.65 


.50 


.55 


.50 


.45 


.40 


.00 


0.25 


1:19 


1:11 


1 


9 


1:8 


1:7 


1:6 


1 


6 


1:5 


1:4 


0.50 


1:8 


1:5 


1 


4 


V.3 


2:5 


2:5 


1 


2 


1:2 


2:3 


0 7FS 


'1:4 


1:2 


2 


3 


3:4 


4:15 


i:1 


1 


1 


9:8 


3:2 


1.00 


2:5 


4:5 


1 


1 


7:5 


8:5 


9:5 


2 


1 


11:5 


3:1 


1.25 


5:8 


5:4 


2 


1 


5:2 


3:1 


7:2 


4 


^ 


9:2 


7:1 


1.50 


1:1 


2:1 


3 


1 


9:2 


11:2 


7. 1 


8 


1 ', 


9:1 


13:1 


1.75 


3:2 


7:2 


6 


1 


8:1 


11:1 


14: 1 


17 


1 


19:1 


39:1 


2.00 


2:1 


6:1 


11 


1 


16:1 


All greater than 


20 to 1 





Section Vl: Average Reliability = 0.75 

Correlation Between Variables 



Difference in 



S D Units 


.70 


.65 


.60 


.55 


.50. 


.45 


.40 


.00 


0.25 


1:33 


1:18 


' 1:13 


1:11 


1:10 


1:9 • 


1:8 


1 


5 


v.0.50 


1:12 


1:8 . 


1:5 


1:4 


2:7 


2:7 


1:3 


1 


2 


••0.75 


1:6 


2:7 


2:5 


1:2 


3:5 


2:3 


2:3 


1 


1 


1.00 


1:4 


1:2 


2:3 


6:7 


1:1 


6:5 


4:3 


2 


1 


1.25 . 


2:5 


3:4 


1:1 


7:5 


5:3 


2:1 


9:4 


4 


1 


1.50 


1:2 


1:1 


2:1 


7:3 


3:1 - 


7:2 


4:1 


7 


1 


1.75 


4:5 


8:5 


5:2 - 


7:2 


9:2 


6:1 


7:1 


14 


1 


2.00 


• 7:6 


5:2 


4:1 


6:1 


8:1 


11:1 


13:1 


30 


1 



Section Vl'l: Average Reliability = 0.70 

Correlation Between Variables - 
Difference in 



S D Units .65 .60 .55 .50 .A5 .40 .00 

0.25 1:47 1:23 1:16 1:14 1:12 1:11 1:6 

0.50 1:20 1:10 1:7 1:5 1:5 1:4 2:5 

0.75 1:10 1:6 1:4 1:3 2:5 1:2 3:4 

1.00 1:6 1:3 1:2 3:5 7:10 4:5 7:5- 

1.25 1:4 1:2 5:7 1:1 1:1 - 4:3 5:2 

1.50 1:3 5:7 1:1 3:2 9:5 2:1 4:1 

1.75 1:2 1:1 8:5 2:1 ' 3:1 7:2 8:1 

2.00 2:3 3:2 5:2- 7:2 9:2 5:1 14:1 



62 Dilemmas in Diagnosis 



Section VIM: Average Reliability = 0.60 



Correlation Between Variables 



Difference in 
S D Units 



.55 

1:55 

1:30 

1:13 

1:9 

1:7 

1:5 

2:7 

1:3 



.50 

1:35 

1:14 

1:7 

1:5 

2:7 

2:5 

3:5 

3:4 



.45 

1:24 

1:10 

1:5 

2:7 

2:5 

3:5 

4:5 

6:5 



.40 

V:20 

1:8 

1:4 

2:5 

5:9 

4:5 

6:5 

5:3 



.00 

1:9 

1:4' 

1:2 

4:5 
13:10 
2:1 
3:1 
5:1 



0.25 
0.50 
0.75 
1.00 
1.25 
1.50 
1.75 
2.00 




Consider first Section 1 1 1 of Table 1 —the section for all average 
reliability of 0.90— since 0.90 is a fairly representative reliability 
for good quality ability tests. Note first that no column is shown 
for an inlercorrelation of .90 or higher between the two tests. 
Whenever the intercorrelation of two tests is as high as their re- 
spective reliabilities, they are effective measures of identically 
the same trait. Differences between the two are then equivalent to 
(and equal in number to) differences arising solely from measure- 
ment error; there is no basis for a diagnostic judgment, and any 
diagnostic statement should be made with exactly zero confi- 
dence. 

Note next that when the difference small, the betting odds 
are low that this is a real difference, no matter what the correla- 
tion. In the row corresponding to a difference of a quarter of a 
standard deviation, the odds that the difference is a "real" one 
range, from 1 real difference to 7 chance differences when the 
correlation is 0.85, to 1 real difference to 2 chance differences 
when the correlation is zero. Most small differences are readily 
attributed to measurement errors, and our confidence that there is 
any "real" difference must be correspondingly low. 

Finally, in this table we can see the role that the correlation 
between two, test scores plays in our confidence in the reality of 
any observed difference. This is seen perhaps as clearly as any- 
where in the row corresponding to one full standard deviation of 
difference— a difference that would correspond roughly to falling 
at the 70th percentile of a group on one nr easure and the 30th on 
the other. For a difference of this size, our betting odds would be 



THORNDIKE 



63 



3 to 2 in favor of a "real" difference if the correlation between the 
two test scores were 0.85, 3 to 1 if the correlation were 0,80, 9 to 
1 if the correlation were 0.60, and 17 to 1 if the^correlation were 
zero. The confidence we should have in a diagnostic judgment rises 
sharply as the correlation between the two measures on which the 
judgement is based decreases. 

To viewjhe effect of test reliability on the confidence appro- 
priate for our judgments, it helps to arrange the tables in a some- 
what different way. Table 2 shows the "betting odds" when the 
size of the difference between X and Y is fixed at one standard 
deviation, but the values of the average reliability and the intercor- 
relation are allowed to vary. This table makes it emphatically clear 
how crucially one's confidence depends upon the reliability of the 
nneasuring instruments. If the average of the two reliabilities is 
0.98 (one should live to see the day when such measures are 
available!), even the smallest differences, i.e., those of a quarter of 
a standard deviation, can be accepted with great confidence as real 
and not the result of measurement error. With a reliability as low 
as 0.60, a full standard deviation of difference justifies, bettiing 
odds of less than even money, even when the correlation between 
the two measures is zero. For intermediate reliabilities, consider- 
able confidence is justified if the correlation between the two 
measures is low, relatively little confidence is justified if the inter- 
correlation approaches anywhere near the reliability. 

TABLE 2. 

Odds that an Observed Difference of One Standard Deviation between Two 
Variables Is a Real Difference 



Correlation Between Variables 



Average 












Reliability 


.95 


.90 .85 .80 .75 .70 .65 .60 


.50 


.40 


.00 


,98 


All 


greater than 20 to 1 








.95 




13:1 Remainder greater than 20 to 1 








,90 




3:2 3:1 5:1 7:1 8:1 9:1 


11: 1 


l3: ( 


17:1 


.85 




2:3 4:3 2:1 5:2 3:1 


11:3 


4:1 


6:1 


.80 




2:5 4:5 1:1 , 7:5 


9:5 


11:5. 


3:1 


,75 




1:4 1:2 -2:3 


1:1 


4:3 


2:1 


,70 




1:6 1:3- 


3:5 


4:5 


7:5 


.60 






1:5' 


2:5 


. 4:5 



64 



Dilemmas in Diagnosis 



What do the tables that we have looked at imply when this 
type of thinking is carried over to some samples of actual tests 
with the reliabilities and intercorrelations that characterize them? 

Davis (1) has carried out some of the most meticulous research 
on the differentiability of different types of reading skills. Among 
the abilities that he studied, two that were most readily distin- 
guishable were word knowledge and drawing inferences. His tests 
had to be quite short, since he was measuring some eight. different 
aspects of reading, so the reliabilities of these two tests were only 
.58 and .59. The correlation between them had an average value of 
.45 in several sets of data. Given these values, the betting odds are 
only 1 to 4 that a difference of one standard deviation between 
scores on the two tests is "real"; for a difference of two standard 
deviations the betting odds are 9 to 8. As they stand, the tests 
hardly justify diagnostic inferences even when the differences are 
very large. But these tests were short-only 12 items each. If they 
were lengthened to 48 items, which might be a reasonable length 
for a test in practical use, one estimates i u-it the reliabilities would 
be increased to .85 and .86, and the intercorrelation to .66. Then 
the betting odds are respectively 5 to 2 for a difference of one 
standard deviation and 80 to 1 for a difference of two standard 
deviations. Thus, wo see how very critically diagnostic inferences 
depend upon the reliabilities of the constituent measures. 

Two of Davis' tests that measure more similar functions are 
the test of inference and a test that calls for identification of the 
author's tone, mood, and purpose. Hsre the reliabilities are .59 
and .63, and the intercorrelation is ,b2. Given those values, the 
betting odds for the existing test are only 2 to 1 1 for a difference 
of one standard deviation and 3 to b for a difference of two 
standard deviations. Lengthened to 48 items, reliabilitios become 
.84 and .88 and the intercorrelation 0.75. For this lengthened test, 
the betting odds are 5 to 3 that an r.bserved difference of one 
standard deviation is "real" and 23 to 1 for a difference of two 
standard deviations. 

Let us turn our attention now to the Stanford Diagnostic 
Reading Tests, the distinctive .value of which is presumed to lie in 
their diagnostic effectiveness. Here, unfortunately, the manual 
provides only single-testing estimates of reliability, and these are 
certainly somewhat inflated. We cannot know hoA^ much. If we 
take the figures at face value, the average of the subtest reliabilities 



THORNDIKE 



65 



is 0.90 and, the average of the subtest' intercorrelations is 0.65. A 
more realistic estimate of alternate-form reliabilities might be 
0.85. If we assume that figure, and turn to Section IV of Table 1 
for reliability 0.85, we find figures in the column for intercorrela- 
tions of 0.65 as follows: 



0.25 S.D. (which would occur for 38% of children) ■ 1 to 6 

0.50 S.D. (which would occur for 27% of children) ' 1 to 2 

0.75 S.D. (which would occur for 1 9% of children) 8 to 7 

1.00 S.D. (which would occur for 12% 9f children) 5 to 2 

1.50 S.D. (which would occur for 4% of children) 10 to 1 

2.00 S.D. (which would occur for 1% of children) over 20 to 1 



Thus;* if we limit our diagnostic inferences to the one percent 
with the most extreme differences, our judgments will almost 
always have a real basis. .If we set a lower threshold, and undertake 
diagnostic statements for ?s many as 10 percent of children, there 
will be a basis in reality /or something like three-fourths of our 
judgments. If we set a. still more liberal standard, ^ind venture 
diagnostic statements based on observed differences for as many as 
20 percetu of fhie group, the statements will correspond to real 
differences only about half the time. 

Finally, consider a set of data for the reading test of the Stan- 
ford Achievement Battery given once m the sixth and once in the 
eighth gr^de. For one suburban New York school system, the 
correlation between the two testings was .747. An estimate of 
reliability drawn from the, test manual is .93. How much would a 
child have to change his position in his group from the first to the 
second testing for us to hav - an even-money bet that there was a 
real change? The answer comes out lo be 0.40 standard deviations. 
If a child were to improve his position in his group by four-tenths 
of a standard deviation (for example, from the 50th to the 65th 
percentile), it is a fifty-fifty proposition that this represents some 
degree of real change and. not just the effect of measurement 
errors. 

The tables and illustrations that we have examined illustrate 
the impact of • reliability, intercorrelation, and score difference 
upon the confidence that one can logically place in an observed 
difference between two scores. They illustrate that over the realis- 
tic range of test reliabilities, arid. using the kinds of pairs of meas- 
ures that we are likely to want to use in diagnostic studies, the 
confidence is often distressingly low. But children with reading 



66 



Dilemmas in Diagnosis 



disabilities are there, and they won't just yo away until that happy 
day when we have diagnostic tools of reliability high enough to 
permit us to make judgments of score difference at a high level of 
confidence. Therein lies our dilemma. Wherern do we find our 
salvation? , ■ 

If salvation exists, it lies in the fact that most of the actions 
following from diagnostic judgments are reversible, and if they are 
unfounded .they are likely to result in wasted lime or effort rather 
than any more crucial loss. In this respect, instructioruil decisions 
differ from selection and classification decisions, since these are 
typically permanent. The young person who is denied access to a 
particular educational institution or job is not likely to be given a 
second chance. But if the special instruction in word-analysis skills 
that seems to be called for by a diagnostic reading profile is not 
effective, it is always possible to hold up, take stock, get new or 
additional evidence, and follow up some alternative hypothesis. 
Our tables of betting odds suggest how tentan've our hypotheses 
should often be. Fortunately, they often can be tentative. It is 
important that we keep them so. 

References 

7. Davis, Frederick B. "Research in Comprehension in Reading," Reading 

Research Quarterly, 3 (Summer 1968), 49?J-545. . 
2. Kelley, Truman L. Interpretation of Educational Measurements. New 

York- World Book, 1927. 



THORNDIKE 



67 



Mary M. Brittain 
University of Wisconsin 



GUIDELINES FOR EVALUATING 
CLASSROOM ORGANIZATION 
FOR TEACHING READING 



The variety of classroom organizational patterns for reading in- 
struction is enormous. Even a limited sampling of the organiza- 
tional "smorgasbord" renders one intellectually replete: total class 
grouping arrangements (whether on a temporary basis as for choral 
reading or open text sessions, or on a more permanent footing as 
in tracking or special reading classes); cross-class groupings, such as 
in the ungraded primary or Joplin approaches; intraclass grouping, 
including bi-, tri-, or multibasal grouping, grouping by invitation, 
grouping to meet special interests or skill needs of pupils, student- 
led small team grouping, tutorial grouping. The spectrum extends 
to complete individualization of instruction. 

These multitudinous organizational patterns are all attempts to 
increase the teacher's efficiency in meeting the reading needs of 
individual children, and most, though not all of them, seek to do 
this through the reduction of pupil heterogeneity. (Actually, a 
number of the plans share many other common attributes, a factor 
which complicates the evaluative process.) While it is not feasible 
here to undertake any extensive comparison of organizational pat- 
terns, it is perhaps possible to suggest a combination of evaluative 
approaches that will render such comparisons meaningful. 

EVALUATION THROUGH STANDARDIZED TESTS 

In the main, the evaluation, not only of classroom organiza- 
tion, but of most of the elements of the reading program, has been 
done in terms of their impact on student growth (5) and growth 
has been measured most frequently by means of standardized 



68 



Classroom Organization 



tests. The inadequacies of standardized tests as measures of pupil 
growth are well known, but it may be helpful to note some of the 
distinctions between the processes of testing and evaluatior, and to 
question the assumption that student growth in reading skill 
should form the sole basis for evaluation. 

Ammoi^s (/) has defined evaluation as the "description of stu- 
dent progress toward educational objectives," and has noted that, 
in contrast to testing, evaluation is directed more to i.idividuals 
than to groups, and seeks to describe the progress of the individual 
student toward certain school-defined objectives. Standardized 
tests, on the other hand, are not typically criterion-referenced; 
rather they aim to compare the progress of a group (less success- 
fully that of an individual) with that of other (normative) groups 
and, while these tests may indicate the level of a group's perform- 
ance, they seldom provide insights as to why a group performs as 
it does. 

Standardized measures have some further shortcomings as eval- 
uative instruments. If we assume that evaluative procedures should 
be ongoing and should provide sufficient examples of a student's 
work to sample the various skills of reading adequately, then any 
one-shot temporally-discrete testing method will be found want- 
ing. The importance of repeated sampling in evaluation can 
scarcely be overemphasized, significant differences in test perform- 
ance having been demonstrated with changes in examiner, test 
content, physical setting, and time of day or year. 

If standardized measures are to yield any relevant information, 
care must be taken to select tests that are appropriate to the 
content of the r^iading program— tests that actually measure the 
behaviors deemed important in accomplishing the school's objec- 
tives. Testing instruments may, in fact, bear little relation to the 
objectives ;of a particular educational program. For example, the 
use of a typical reading achievement test to evaluate an organiza- 
tion whose primary objective is the development of more positive 
r'jttitudes toward reading would be an exercise in futility. 

GOAL REFERENCED EVALUATION 

If, as Barrett (3) has suggested, instructional goals should form 
the basis of evaluation, then the philosophy of the school regard- 
ing reading, and the manner in which the school defiries the read- 
ing process assume importance. While it is certainly possible to 



BRITTAIN 



69 



define reading differentially, it will be assumed for the purposes of 
this paper that reading is not solely a perceptual or cognitive pro- 
cess (though ii certainly subsumes these elements), but includes 
affective aspects s: jh as appreciation and enjoyment. This being 
the case, evaluation of any organizational strategy for reading in- 
struction may well begin, as Russell and Fea (9) have suggested, 
with an analysis of the characteristics of successful readers in all 
the above-mentioned parameters of reading— their habits, skills, 
attitudes, and interests— and the behaviors [hat are implicit in the 
development of these characteristics. Such an analysis should yield 
hypotheses regarding fhe ideal classroom arrangement for evoking 
the desired behaviors. For example, if one assumes that the suc- 
cessful reader is characterized by concentration on the meaning of 
a selection, one may hypothesize that solitude while reading, free 
from group distractions, would be conducive to the development 
of concentration and therefore opt for an individualized approach. 
However, one may also conceptualize the good reader as one who 
can successfully interpret a selection for the pleasure or profit of 
others. One might then suppose that small-group organization for 
oral reading would be desirable. Evaluation would proceed in 
terms of how successfully the organizational pattern promoted the 
skills or attitudes enumerated in advance-in effect, a criterion- 
referenced approach. The evaluation should also include a state- 
ment of which features of the classroom organization plan 
appeared to contribute to which, outcomes. 

Research on the effectiveness of grouping for reading instruc- 
tion is neither copious nor consistent, perhaps partly because of 
the application of inappropriate measures, but also because of the 
failure of research studies to include sufficiently detailed descrip- 
tions of liie instructional practices employed, that is, the imple- 
mentation of the organizational procedures. The coi 'iicting results 
from the USOE first grade studies of reading instruction (4) offer 
ample testimony to the difficulty of being sure of what one is 
actually evaluating. 

Classroom organizational patterns foi teaching reading, what- 
fiver their particular form, must answer to the demands inherent in 
the nature of reading, those .nhereni in the learner, and those 
inherent in the resources of the school. In an effort to provide 
some useful guidelines for evaluation, certain organizational 
characteristics may be hypothesized as favoring the fulfillment of 



70 



Classroom Organization 



these multiple demands. The following organizational standards 
have been so derived. 

Goal direction 

Given the non-unitary character of the reading process, which 
presupposes a multiplicity of instructional goals, evaluation of 
classroom" organizational procedures (whether the classroom is 
self-contained or of the multi-unit sort) must include an estimate 
of their efficacy in promoting the various aspects of reading. Fur- 
ther, grouping strategies should be examined for theJr facilitation 
of growth toward expressly stated goals. Some examples follow: 

Word Perception— Does the organization promote flexibility in 
methods of word attack? 

Comprehension— Does the organization lead to inferential and 
critical responses to what has been read? 

Appreciation— Does the organization facilitate responses to 
artistic, humorous^or stylistic elements of selections? 

Rate of Reading— Does the organization foster flexibility of 
reading rate?- 

Oral Reading— Does the organization develop skill and enjoy- 
ment in oral reading? 

Study Skills— Does the organization advance growth in reading 
specific to the various content areas? 

Flexibility 

For many years it has been suggested that good organizational 
strategy should provide for flexibility of group size and member- 
ship, that students should be afforded the opportunity to work 
not only with the whole class, but witii siDjII groups and individ- 
uals. In view cf the complexity of the reading process and the 
probability that different organizational arrangements will be con- 
ducive to differential skill development, the criterion of flexibility 
would appear sound. A further ajvantaje of group flexibility is 
that learners have the opportunity to work with others who may 
or may not be similar in general attainment, but who share com- 
mon skill development needs or common interests. Moreover, 
flexibility of grouping would reduce the likelihood of a stigma 
being aitached to perpetual membership in a "low" group, ihus 
contributing to the healthy de ^lopment of the self-concept. 



BRITTAIN 



7V 



Interests 

Current statistics relating to the reading habits of adults (2) 
demonstrate all too clearly that instructional programs in the 
nation's schools could be greatly improved insofar as the promo- 
tion of reading as a leisure-time pursuit is concerned. Concern for 
the affective aspect of reading suggests that the organizational pian 
adopted should not only allow for pupils' self-selection of mate- 
rials and self-pacing in these", but should permit the extension of 
interests through teacher^upil and pupil-pupil exchanges. 

Independence 

An important goal of reading instruction is to produce mature 
readers who are able to gain both pleasure and profit from printed 
material without the constant assistance or direction of a teacher. 
Data from transfer of training studies (7) suggest that self-direc- 
tion in reading activities should receive an oarly introduction. Sup- 
plementation of teacher-directed groups with those directed by 
individual students should also increase pupil awareness of pro- 
grann goals through greater involvement in. planning and imple- 
mentation. 

Homogeneity 

While heterogeneity is a healthy fact of group life, any group- 
ing, whether by ability, achievement, or interest, should be suffi- 
ciently homogeneous to afford a reasonable opportunity of suc- 
cess or self-fulfillment to the members. This is not to suggest that 
rigid, narrow criteria for group membership should be established, 
but requiring an individual's membership in a group that is grossly 
divergent from him in needs, preferences, and attainments can 
hardly be justified on either cognitive or affective grounds. 

Instructional Personnel 

The organizational plan should be realistic in terms of the 
degree of teacher expertise required and should be mindful of 
individual differences between teachers as well as pupils. For 
example, it has long been noted that individualized reading pro- 
grams, while extremely effective in improving children's attitudes 
toward reading, require teacherr of notable independence and 
competence. A teacher with more modest attainments, or one ,vho 
is more secure within a structured framewc. h, would probably 



72 



Classroom Organization 



function more effecfively under an alternative arranpennent. 
Teachers should be able to select, from a number of organizational 
patterns, those that enhance, rather than inhibit their effective- 
ness. 

Administration 

Certain logistical problems must also be addressed. Do the 
proposed grouping procedures promote ease of scheduling? Do 
they require resources whether of materials, space, or personnel 
that are within the school's capacity? 

A checklist based upon the foregoing characteristics of group- 
ing practices is appended to this paper. The checklist suggests 
important areas of concern relating to the learner, the reading 
process, and administrative concerns and, within these categories, 
includes sample questions that may guide classroom organizational 
patterns for teaching reading. 

ALTERNATIVE EVALUATIVE PROCEDURES 

If it is assumed that, given the complexity of the reading pro- 
cess, a variety of evaluative techniques are requisite, what supple- 
ments to standardized measures are available? It must be admitted 
at the outset that there exists no single well-established theory of 
methodology for measuring classroom behavior, but some reason- 
able possibilities-not without their own limitations as to relia- 
bility and relevance— include: 

Repeated systematic observation within the classroorri through 
3uch media as film, kinescope recordings, observers utilizing rating 
scales. (If, for example, one wishes to assess interest in reading as a 
leisure-time pursuit, observation of free-choice situations in which 
students may select from a variety of activities should provide valu- 
able insights regarding leve' of interests in recreational reading.) 

Paper and pencil measures, such as anecdo^al records kept by 
tCcchers and pupils relating to types and amounts of reading done; 
records of comprehension difficulties; vocabulary files; interest and 
attitude inventories; social adjustment measures; teacher-made 
checklists of reading skills. 

Informal estimates of reading habits, skills, attitudes, and inter- 
ests such as may be deris/ed ("om performance on informal reading 
inventories, performance In content area reading, tapes of students' 
oral reading, records of student library usage, and of out-of-school 
reading habits. Students and parents, as well as teachers, may con- 
tribi'te to these informal evaluations. 



BRITTAIN 



73 



CONCLUSION 

The cim of this paper has been to present a theoretical frame- 
work for the evaluation of organizational patterns for reading in- 
struction and to suggest some supplementary approaches to the 
traditional use of standardized measures. Of necessity, the treat- 
ment has involved sampling from many different aspects of read- 
ing instruction, since any organizational strategy exists chiefly to 
advance the many and complex goals of reading instruction. It is 
on the quality of service to 'all these masters that evaluation of 
organizational patterns must proceed. 

A Checklist for Evaluating Classroom Organization 
for Teaching Reading 

STUDENT CHARACTERISTICS 

Physiological 

Do pupils have sufficient opportunities for movement? 

Are special sensory needs of pupils met? 

Social 

Are students' group roles clearly defined? 

Is student-direction of learning situations encouraged? 

Affective 

Is a reasonable opportunity of success ensured? 

Is stigmatization avoided? 

Can the Va/ied Intsrests of students be met? 

Cognitive 

Are the experience backgrounds of students utilized? 

Can differing rates ot learning bo provided for? 

Are pupils able to exchange opinions regarding selections? 

Educational 

Are planned experu ces to meet specific skill needs possible? 
Aie sufficient opportunities provided for diagnosis? 

INSTRUCTIONAL GOALS 

{fVord Recognition 

Are a "ariety of word recognition methods practiced? 
Is accuracy of v^ord perception facilitated? 



Classroom Orga lization 



Comprehensron 

Are mferential as well as literal skills facilitated? 
Are exchanges of critical judgments encouraged? 
Appreciation 

Are students' reading interests broadened? 

Are students' reading attitudes and habits improved? 

Rate of Reading 

Is flexibility of rate encouraged? 

Is pressure to maintain a group standard of rale avoided? 

Oral Reading 

Is a reasonable balance maintained between oral and silent reading? 
Are meaningful alternatives to round-robin reading provided? 
Content Area Reading 
Are the various study sktlls practiced? 

Is there opportunity for students to apply what they have read? 
IMPLEMENTATION 

Teacher Personnel 

Is teacher expertise ma:;imally utilized? 

Are teacher preferences and interests considered? 

Data Collection 

Can adequate samples of students' reading behavior be obtained? 
Can information regarding growth in specific skills be obtained? 
Scheduiir}g 

Is scheduling simplified? 

Can fiexibiiity of instructional time be maintained? 
^Jlaterials 

Can a variety of materials be employed? 

Are the requisite materials within the school's financial resources? 



References 

/. Ammons, Margaret. "Evaluation: What Is It? Who Does It? When Should 
It Be Done?" in Thomas C. Barrett {Ed.), The Evaluation of Children's 
Reading Achievement NfiWark, Delaware: International Reading 
Association, 1967, M2. 

2, Asheim, Lester. "What Do Adults Read?" Fifty-Fifth Yearbook of the 
National Society for the Study of Education, Part II. Chicago: Univer- 
sity of Chicago Press, 1956, 5-28. 



BRITTAIN 



75 



3. Barrett, Thomas C. "Goals of the Reading Program; The Basis for Evalua- 

tion," in Thomas C. Barrett (Ed.), The Evaluation of Children's Read- 
ing Achievement. Newark, Delaware: International Reading Associa- 
tion, 1967, 13-26. 

4. Bond, Guy L, and Robert Dykstra. "The Cooperative Research Program' 

iH First Grade Reading Instruction," Reading Research Quarterly, 2 
(Summer 1967), 5-142. 

5. Farr, Roger. Reading: What Can be Measured? ^ey\f ark, De\a\Nare: Interna- 

tional Reading Association, 1969. 

6. Heathers, Glen. "Grouping," in Robert L. Ebel (Ed.), Encyclopedia of 

Educational Research (4th ed.). New York: Macmillan, 196&, 559-570. 

7. Klausmeier, Herbert J. "Transfer of Learning," in Robert -L. Ebel (Ed.), 

Encyclopedia of Educational Research (4th ed.). New YorkV Macmillan, 
1968, 1483-1493. 

8. Medley, Donald M., and Harold E. Mitzel. "Measuring Classroom Behavior 

by Systematic Observation," in N. L. Gage (Ed.), Handbook of 
Research on Teaching. Chicago: Rand McNally, 1963, 247-328. 

9. Russell, David H., and Henry R. Fea. "^Research on Teaching Reading," in 

N. L. Gage (Ed.), Handbook of Research on Teaching.' Chicago: Rand 
McNally, 1963, 865-928. 



76 



Classroom Organization 



Morton Botel 
University of Pennsylvania 
John Dawkins 
Research for Better Schools 
Philadelphia, Pennsylvania 

Alvin Granowsky 
Diagnostic Reading Center 
Greensboro, North Carolina 



A SYNTACTIC 
COMPLEXITY FORMULA 



A reliable and valid measure of complexity of syntactic structures 
is of theoretical interest and should be helpful in preparing and 
selecting reading materials. A count of the number of words per 
sentence/ the measure most widely used at this time, has been 
judged inadequate by newer theories of grammiar as well as by 
research findings. Other generc/lly used measures of syntactic com- 
plexity focus only on a few syntactic structures which correlate 
somewhat with reading complexity, but in no way indicate the 
relative complexity of the major portion of syntactic structures 
found in reading materials. 

In an attempt to overcome these inadequacies, a heuristic was 
developed— a syntactic complexity formula (1, 2), This formula is 
based on 1) a theory of transformational grammar that suggests 
that complex sentences can be thought of as derived from pro- 
cesses of changing and combining underlying structures (simple 
sentences, for our purposes); 2) experimental data on children's 
processing of syntactic structures; and 3) language development 
and performance studies of the oral and written language used by 
children. * _ j 

A measuring device that takes into account multiple factors of 
syntax will reveal a great deal of information about what is and 
what is not hard to process for the young reader. However,, the 
device will also have limitations that must be mentioned. First, 
there are a number of factors in syntax, and many factors in 
semantics, that do not readily lend themselves to measurement. 



,^ROTEL/DAWKINS, and GRANOWSKY 77 

ERIC 



Second, there are small degrees of differences in syntactic diffi- 
culty that cannot be rated on a scale without making it far too 
cumbersomi: to use. For this reason, we have rated many items bs 
equivalents, when some differences in their complexity clearly 
exist. , , 

Finally, two cautions noed to be noted in using the syntactic 
complexity formula; 1) It should be used in conjunction vvith a 
measure of vocabulory; and 2) the value of the instrument lies not 
in giving a precise measurement but in ranking syntactic struc- 
tures. 

To apply the syntactic complexity formula to any passage, 
each sentence in the pasj;age is assigned a complexity rating. These 
ratings are then averaged to obtain the complexity rating for t^ .: 
entire passage. The complexity rating for a sentence is dete..-^ 
by comparing the structure of the sentence K' "he structures de- 
scribed and illustrated on the following pages. The basic structure 
of the main clause of the sentence is assf^ned a count of 0, 1, or 2 
and counts are added for additional features or structures that add 
complexity. For example, the sentence His vacation over, the tired 
doctor drove home has j complexity count of 4: The basip struc- 
ture SV(Adv) {The doctor drove home) gets a count of 0 (see I A 
under "0-Count Structures"). Since the subject {doctor) is modi- 
fied by an adjective {tired) a count of 1 is added (see IMA under 
"1-Count Structures"). The absolute (His vacation over) at the 
beginning of the sentence adds an additional count of 3 (see II 
under "3-Count Structures"). The whole sentence thus receives a 
count of 0+1+3 = 4. 



Count 



0- Count Structures 

t. The Most Frequently Used Simpte Sentences 
A. Subject Vet b (Adverbial) SV{Adv) 

Hd went. ^ 0 

Bob had gone. ' 0 

The boy had gone. . ^ 0 

That 'joy had gone (home). 0 
Thosf girls have been playing (at their house). . 0 



Syntactic Complexity Formula 



B. Subiect-Verb-Object SVO Count 
She hit it, 0 
The fish weighed a pound. - 0 
Those girls have been hitting my ball. 0 

C. Subjec v Verb-be-Complemen t* S beC 

pattern 1: adjective SbeC-adj. 

He is big. 0 

He is very big. 0 

The girl seemed big. 0 

These children will have grown big. 0 

pattern 2: noun S be C- noun 

She became president, . 0 

That girl was their president. 0 

Those students have been our presidents. 0 

pattern 3: adverbial SbeC-adv. 

He is there. 0 

It is there. • 0 

She will be there. • 0 

Those girls have been in their homes. 0 

D. Subject-Verb' Infinitive SVInf. 

Bob wanted to go. ' , 0 

These girls will want to eat. • "0 

Our children have been waiting to eat. 0 

II. Simple Transformations 

A. Interrogative , t 
1. Sirrtple question: 

Wilfhe run? 0 

Did he do it? 0 
2» Tag-end question 

Declarative sentences can become questions by 
adding sentence tags: 

The game was good, wasn't it? 0 

, He did it, didn't he? ' 0 

B. Exclamatory 

Whatagamel 0 

What a game it was! 0 

How wonderful! 0 

C. Imperative . 

(You) Get the milk. (!) 0 

(You) Go to thestore.(!) 0 



•Linking verbs such as seem, became, turn, are included in the category ot "be" verbs. 



BOTEU DAWKINS, and GRANOWSKY 



79 



III. Coordinate Clause Joined By And 

Research indicates that the coordinate clause joined by and 
represents one of the most common, easily processed structures in 

the language, , Count 
J ohn went to the store. 0 
Mary went to the store. 0 
John went to the store and Mary went to the store. 0 



IV. Nonsentence Expressions 

1. noun Of direct address; 

2. greetings: 

3. catis and attention getters: 

4. interjections: 

5. responses: 

6. empty phrases: 

7. sentence openers: 



Is that you/MARY? . 0 

Hi, Hello 0 

Hey , ' 0 

What, Wow, Oh 0 

Okay, Good-by, So long 0 

Really now. You know 0 

Please, Then, But then . 0 



Note: 

I. There is no extra co&x for these expansions of simple sentences. 

A. Verb expansions: 1. terms of be, have and do 

2. will and can 

B. I ntensifier expansions: i^e/y, too, so, muc/j, /nore 

even when two are used together as in nnuch more, so 
very, etc. 

C. Determiner expansions: 
1. articles: a, an, the ^ 

.2. demonstrative pronouns: this, these, that, those 
3. possessive^ronouns: my, our, your, his, its, their 



ERIC 



VCount Structures 

I. Two Less Frequently Used Sentence Patterns 

Count 

A. Subject- Verb-Indirect Object-Direct Object S V 1 0 

He threw HER the ball. 1 

B. Subject- Verb-Object-Object Complement S VOC 

They made him H APP\( , \ 1 

II. Any Prepositional Phrase AJded to Any 0-Cc )nt Pattern 



80 ■ Syntactic Complexity Formula 



A. Subject- Verb (Adverbial) * Count 
The boy had gone home in the morning. 

B. Sub/ec t' Verb' Ob/ec t 
The girl threw the ball TO the catcher. 

C. Subject'be-Complement 
The man behind the desk was big. 

D. Subject' Verb' Infinitive 
. Bob wanted to go Bt FOR E Bi LL. 

III. Noun Modifiers 

A. Adjectives 
The BIG man ate here. 

B. Nouns 
Their team ate the APPLE pie. 

C. Predeterminers (one of, two of, many of, both of) 
ALL OF the players won the game. 

D. Possessive Nouns 
The hat fit his SO N 'S head. 

E. Part/c/pJe [ed and ing forms in the natural adjective position) 
The CRYING bo / ran home. 
The SQALDED cat ran home. 



GENERAL RULE: 

A 0-Count sentence has three or fewer lexical words. 
A 1 -Count sentence generally has four lexical words. 

Lexical words are nouns, verbs, adjectives and adverbs. 
Prepositions are not counted as lexical words. In general, each is 
given a Vcount when added to a basic sentence pattern. 



(V, Other Modifiers 

A . A dverbial A dd It ions to the 0-Coun t Sen tence 

He ran to the stoio later. 1 
He QUICKLY went to the store. 1 

B. Modafs 

(could, dare lo, has to, may, might, must, need to, ought to, shall, 
should, would) 

He MIGHT have won the game. 1 
*The ^irst adverbial in a subject-verb (adverbial) pattern Is not given a count. 



BOTEU DAWKINS, and GRANOWSKY 



81 



Count 

C. Negatives (no, not, neither, never, n't) 

He did NOT see it. 1 
He didN'T do it, did he? " 1 

V, Set Expressions 

These are phrases that are usually strung together. They are 
r^ 'en a 1-count, even if their lexical numbcjr is higher than one. 

Many years ago. Once upon a time, (Every) once in a 

iWe, (Every) now and then, a_ year old (modifier), 

years old (complement), more or less, etc. 



VI. Infinitives 

When the infinitive does not immediately follow the verb, it is. 
considered an expansion' of the basic sentence pattern and given a 
count. 

They wanted the baby TO SLEEP. 1 
They tried hard TO REST. 1 

VII. Gerund 

When the gerund is a subject, it is given a count. (In ail other 
uses, the gerund is co-inted as any other noun.) 

RUNNING is fun. 1 

VIM. Coordinate Clause (joined'by coordinate conjunctions other than 
and: for, but, so, yet, or) 

John worked hard. ' - , ' 0 

He played hard. # 0 

-John worked hard BUT he played hard. 1 

The boy did that job OR you did it. ..1 

The BIG boy did that job OR you did it. 1 + 1=2 

IX. Deletion in Coordinate Clauses 

This process is already accounted for by the "General Rule" on 
lexical additions. Note that antf is'ihclucjed here. 

John was thin. . , , . " , 0 

John was healthy. ' . * , 0 
John was thin to? H EALTHY. 1 ^ 

Joe jumped into the water. Pete jum0c>d into the water. 0 + 0 
Joe 3/7£/ PETE jumped into the water. } 
Joe a/7c/Hls FRIEND PETE jumped into the water. 1 + 1=2 

X. The Paired Conjunction both . . . and 

BOTH Bob did it AND BILL did it. 1 . 

BOTM Bob AND BILL did it. ■ 1 + 1= 2 



82 



Syntactic Complexity Forma fa 



2-Count Stivjclures 



Count 



' I. Passive Transformations 

The ball was hit by Bob. 2 
The ball was hit. (by Bob, understood) 2 

II. Paired Conjunctions (either ... or, neither . . . nor, noi . . . but, 
etc.) 

NEITHER Pete did it NOR Bill did it. 2 
When deletion is involved, simply count the lexical items. 
NEITHER Pete NOB BOB did it. 2+1=3 
NEITHER Pete NOR my FRIENDBOBdidit. 2+1 + 1= 4 

III. Comparatives as as; same as; 

er__ than; mere than . 

Bob was AS tall AS Bill (is). 2 
She is MORE attractive THAN you (are). 2 

IV. Dependent Clause 

A. A djec tive clauses 

The book (that) I read was great. 2 ' 

The postman, who delivers thf /iail, is nice. 2 

B. A dverbial clauses 

HeleftWHEN HE FINISHED. 2 

He came early sothathecouldbuythegift. 2 

C. Nominal clauses 

He asked me WHAT I DID, 2 

V. Participle 

When attached as a modifier in a typical adjective-noun order; 

1-count. But when the participle appears after the noun or is separated 

from it by commas, give it a 2-count. 

BOILING, the water overflowed the pan. 2 
The water, boiling, overflowed the pan. 2 
YOWLING, Xhe scalded cat ran home. 2+1=3 

VI. Infinitive as Subject 

TO RUN is healthy. 2 

VII. Apposiiive 

To be considered a 2-count appositive, the structure must be a 
noun phrase set off by commas.- 

His good friend, a pretty girl, arrived. 4 
(adjectives: good, pretty = 2-count 
appositive: a girl = 2-count) 



BOTEL, DAWKINS. and GRANOWSKY 



83 



VIII. Conjunctive Adverbs 



Examples: thus, moreover, however, therefore; consequently, 
nevertheless {and also still and yet when used as conjunctive adverbs). 

Count 

I went, NEVERTHELESS. • 2 

Y ET , everyone applauded. 2 



3>Count Structures 



Count 



I. Clauses Used as Subjects 

THE FACT THAT HE EATS ts important, 3 

THAT HE EATS Is important, 3 

1). Absolutes 

THE JOB FINISHED, Bob went home, 3 

Mr. Smith lit his pipe, THE PERFORMANCE OVER'. 3 



Special Handling 

I, Noun Clause of Dialogue 

Procedure for counting: Separate the speaker from vi/hat is said 
and count the parts as two sentences. 

ID John said, "I will go." = 0-count 

(a) John said. = 0-count 

(b) \ witlQo. = 0-count 

If either part carries a count, consider if as you would any 
sentence: ' ■ 

(2) The big bird chirped, "Go away!" ='l-count 

(a) The :^ig bird chirped. = 1-count for adjective 

(b) Go away! = O-coufit 

Structures similar in format to the Noun Clause of Dialogue, 
such as say, wonder, believe, feel, will be handled in the same manner. 

• . r . 

84 Syntactv Complexity Formula 



(3) I wondered who would do it. = 1 -count 
la) I wondered, - 0-count 

(b) who would do It. = 1 -count for modal 

(4) Those terrible boys who live on our street said that we 
should go. = 4-count 

(a) Those terrible boys who li\/e on our street said = V 
count for the adjective, 2-count for ,the-^diective 
claust), total - 3-count. 

(b) (that) we should go, = 1 -count for the modal 

II. Inverted Order of Adverbials of Manner and Place 

Whenever these adverbial structures begin the sentence, add a 
. 1-count to the scoring you would typicalty give: 

; Count 

I He ran to the store QUICKLY. . 1 
I QUICKLY, he ran to the store. ^ 2 

Names dnd Titles 

Names and titles, whatever their length, should be regarded as a 
simple noun in scoring. 

MR, WILHAM JONES is here. 0 
THE AMERICAN RED CROSS helps people. 0 

IV.. Hyphenated Words: Count as Separate Words, if the Parts Can 
Stand Alone 

The never-ending day is never ending. 3 



PROCEDURE FOR DETl "^'^^INING 
AVERAGE SYNTACTIC tw ?LEXITY 

The syntactic conriplexity of any passage or sampling of sen- 
tences is the arithnnetic average of the complexity counts of the 
sentences evaluated. For example, if ten sentences had the follow- 
ing counts, their average syntactic complexity would be 2.5, 



1. 2 6. 2 

2. 2 7. 1 

3. 3 . 8. 4 total 25 

4. 1 9. 3 average 2.5 

5. 2 ' 10. 5 



BOTE L, DAW Kl NS, )\n6 G R ANOWSK Y 



PROGRAMING SYNTACTIC COMPLEXITY 

Syntactic complexity of reading materials may be graded from 
a starting point of 0-count complexity to any average syntactic 
complexity count designated a terminal reading level. 

For example, syntactic complexity of materials prepared ior a 
primary reading program may begin at the 0-count level and prog- 
ress to an average complexity count of 3.0 to4.0. 

Application of the formula is shown in the paragraph analyzed 
below: 

Daedalus, the First Man to Fly 

(*1) Daedalus jumped from the mountain top. (2) For a terrible 
moment, he felt straight down, his arms wobbling weakly. (3) But 
then he spread his wings and began to fly. (4) Like a bird, he ilew 
.straight up into the blue morning sky. 



0-Count 1-Count 2-Count 3-Count Total 



1. 


SV Adv, 


adjective 






1 


2. 


SVAdv. 


prep, phrase 

adjective 

adverb 




absolute 


6 


a 


svo ' 


Goord. clause 
deletion: two 
lexical items 






2 


4. 


SVAdv. 


prep, phrase 

adjective 

adjective 


prep, phrase: 
inverted order 




5 



Total 14 
14 divided by 4 = 3.5^ = average syntactic complexity. 



References 

Botel, Morton, and Alvin Granowsky. "A Formula for Measuring Syntactic 
Complexity: A Directional Effort," Elementary English, 49 (April 
1972), 513-516. 

2. Granowsky, Alvin. "A Formula for the Analysis of Syntactic Complexity 
of P'.'mary Grade Reading Materials," unpublished doctoral disserta- 
tion, University of Pennsylvania, 1971. 



86 



Syntactic Con)plexJty Formula 



Theodore A, Mork 
Western Washington State College 



THE ABILITY OF CHILDREN 
TO SELECT READING MATERIALS 
AT THEIR OWN INSTRUCTIONAL 
READING LEVEL* 



During the past several years, much emphasis has been placed on 
programs involving self-selection of reading materials. Special* 
library-centered reading programs expect children to select mate- 
rials that are appropriate for them in terms of interest, maturity 
level, and reading difficulty. Authors' of basal reade'rs emphasize 
the importance of children's reading library books in conjunction 
with their basic texts, and, for the most part, the selection is left 
up to the children. In individualized reading programs based on 
self-selection of reading materials, an ability to select materials of 
appropriate difficulty is essential (1,4,6,8,18). 

However, while several educ .tors, have emphasized that chil- 
dren must be allowed to select their own reading materials, the 
ability of children to choose materials appropriate to their reading 
abilities appears largely to have been assumed. 

During tt\e past four or five years, many university students in 
education and many practicing teachers have seriously questioned 
this assumption. Others have wondered how much guidance or 
help might be necessary to increase children's ability to select 
materials of appropric te levels of difficulty. Jacobs (7) suggested: . 
"The teacher will prcbabiy have to give some' guidance* helping 
the child to be realist) about his choices in terms of his capabili- 
ties, his aspirations, his past experiences." Vite (19), speaking 
from a primary teacher's experience, also suggested that the child 
himself selects, but with guidance and support from the teacher as 
needed. 

•This study was supported by the educational Research Institute of British Columbia. 



M.ORK 



87 



the lack of information available on the difficulty level of 
books that children actually vdo select prompted this research 
study. Its purpose was to provide empirical information about 
come practical aspects of self-selection that may be helpful to 
te-achers and to teachers of teachers. Specifically, the study sought 
to answer the following questions: 1) In relation to a child's in- 
structional reading level, what level of materials, particularly 
library books, does a child select when he is allowed freedom to 
make his own choice? Do children, in fact, select materials of 
appropriate difficulty without guidance from the teacher? 2) Does 
a short, five-minute period of guidance from the teacher that 
emphasizes self-acceptance affect the child's selection? 

In addition, the relationships between the observed discrep- 
ancy scores (the difference between instructional reading level and 
the level of materials chosen) and sex, age, and reading ability 
were explored. 

SUBJECTS 

Twenty-nine children in grade three and thirty-one in grade 
five were randomly selected from a group of 200 children in 
Victoria, British Columbia, elementary schools. These 200 chil- 
dren had already been selected through random procedures for a 
research study being conducted by Tinney ( 76), The children were 
selected from eight different schools, representing a cross-section 
of elementary school children. In the present study no more than 
eight children were selected from each school, and no more than 
four children were selected from each grade level (third or fifth) in 
each school. In all uut two cases, the children came from different 
classrooms. According to the building principals, none of the chil- 
dren in the study had been involved in an individualized reading , 
program for their reading instruction. 

, . PROCEDURES 

■ Each child in the study was asked to select, from each of three 
different sets of reading materials, a piece of material that he 
thought he could read fairly well by himself, vvith perhaps some 
occasional help, from the teacher. This, essentially, was a definition 
of instructional reading level as interpreted for the child. The three 
sets of readi'ng materials were: 1) single pages copied from basal 
readers, 2) a series of basal readers, and 3) 'library books. 



88 



Self-Selection of Materials 



The children at each grade level were randomly divided into 
two groups. Children in \hQ guidance group were engaged in dis- 
cussion relative to the differences found in children, such as run- 
ning speed, height, weight, and shoe size. A comparison was drawn 
between shoe size and "book size" [10). 

The intent of this brief session was not to "tell" the child 
anything, but rather to lead him through a sequence of questions 
and answers that would help him to conclude that it was normal 
for children to differ in their reading abilities and that selecting 
materials for "reading fit" was rather similar to finding clothing of 
the right size. Except for this brief session, the tasks of the children 
in the two groups were identical. 

Each child met with an examiner in an individual session at the 
child's school. If the child was in the guidance group, the examiner 
first discussed with him 'he nature of individual differences and 
"reading -fit" as described above. The first selection task for chil- 
dren in both groups involved separate pages that had been ran- 
domly selected and photocopied from the first half of basal 
readers at successive levels of difficulty, preprimer through oighth 
grade. These readers were from a series unfamiliar to the children 
(74). The child was asked to select, from five of the pages, the 
page that he could read fairly well, but with which he might need 
just a little help* from the teacher. The five pages included the page 
from the reader for the child's grade level and extended two levels 
in each direction. If the page chosen was the lowest or the highest 
difficulty level of the five, the child was asked to look at an even 
lower or an even higher level before making his final choice. If 
necessary, he was shown additional leveL as well. The difficulty 
level of the material finally selected was recorded. 

Next, the child was asked- to perform the same task with five 
books from the same series of basal readers. Interest in any single 
story may have affected his selection; thus, once the child had 
made a tentative selection, the examiner suggested that he look at 
several different stories in the selected reader before making a final 
choice. Again, "if the lowest -or highest. level reader was selected, 
the examiner suggested that the child look at an even lower level 
reader or an even higher level reader before making his final choice. 

The final step was for the child to go to the school library, 
which was a larger source than most classroom libraries, and to 
select a book to read using the same criteria described above. A 



MORK 



89 



readability check was run on the book to determine the difficulty 
level. 

The instructional reading levels for thes^ cfiildren had been 
established for a separate study {16), A reading inventory had 
been administered by advanced students in Reading Education at 
the University of Victoria. The reading inventory used requ'red 
the child to read one passage orally and one silently at each succes- 
sive reader level. Oral reading errors were scored, as were oral and 
silent reading comprehension. The instructional reading level was 
the highest level at which the child could read with at least 95 
percent word recognition and at least 70 percent comprehension. 
The criteria for determining the instructional levels were based on 
those suggested by Betts (2). The examiners in the present study 
were not informed of the instructional reading levels of the chil- 
dren until after the children had made their selections. 

The difficulty leveis of the separate pages and of the basal 
readers were substantiated using the Fry Readability Graph (5). 
Readability was checked to be sure the chosen pages were repre- 
sentative of the difficulty of the books from which they were 
selected. The readability levels of the library books chosen by the 
children were also determined using the Fry graph. 

The differences between the established instructional reading 
levels (grade scores) and the difficulty levels (grade scores) of 
materials selected b\' the children v/ere determined. These differ- 
ences {discrepancy scores) for the guidance group and the no- 
guidance group were ;hen compared. 

RESULTS AND Vi ^CUSSION 

It appeared thai K'^r many of the children, trials one and .tvA/o— 
selecting a separat6 ;^age and selecting a basal reader— functioned 
as a training situatit.' . That is, several of the children seemed to be 
learning* ihe task dunng these two situations. For this reaso*n, the 
comparisons between groups were based on the discrepancy scores 
;:for the library book selections. An additional reason for using the 
library book selections is that library books represent more accu- 
rately the type of material with which seff-selection normally 
occurs in the elementary classroom. 

The effect of the five-minute period defined as guidance was 
evaluated by a f-test of difference between ihe mean discrepancy 



90 



Self-Selection of Materials 



scores of the guidance and the no-guidance groups. The means and 
standard deviations of the discrepancy scores are reported in Table 1. 

TABLE 1 



Means and Standard Deviations of Discrepancy Scores for Selections 



Group 




Based on Signed Values^ 


Based on Absolute Values 


Guidance 


Mean • 


-0.63 


1.33 




S.D. 


1:55 


1.02 . 


No Guidance 


Mean 


-0.32 


1.12 




S.D. 


1.57 


1.14 


^Positive value 


means 


instructional level higher 


than readability level of 



selection. 

Note that means and standard deviations are reported in two ways, 
first using their signed (positive or negative) values, then their 
absolute values. Because the positive and negative discrepancy 
scores tended to offset each other, these means of the signed 
values are somewhat deceiving. The means of the absolute values 
give a more accurate picture of the actual d'fstances between in- 
structional reading levels and the levels of material selected. The 
difference between the guidance and the no-guidance groups was- 
not significant for either the signed or absolute discrepancy scores,- 
and it must be concluded that the guidance had no effect. Whether 
regularly repeated guidance of this or another sort might be effec- 
tive remains to be studied. 

After it had been established that no significant difference 
existed between the guidance and the.no-guidance groups, the data 
from all subjects were combined to determine the level of mate- 
rials children actually do select relative to their instructional ievel. 
A t test for matched groups was used to evaluate the difference 
between the children's instructional reading levels and the diffi- 
culty levels of their library book selections. The mean instruc- 
tional reading level (4.77) for the 60 children was significantly 
- lower than the mean difficulty" level (5.24) of their selections 
(f = 2,33, p< .05). Thus, the children did not, as a group, select 
materials with, difficulty level equal to their established instruc- 
tional reading levels. but tended to select material at a higher level. 



^ MORK 

ERIC jfs 



91 



However, the actual difference between the mean instructional 
reading level and the mean difficulty level, of the library books 
selected was less than one-half year. 

How large a difference can be tolerated between a child's in- 
structional reading level and the difficulty level of the material he 
selects, and still have the material appropriate for him? McCracken 
(9, 10) contends that a child's instructional reading level usually is 
a range of two or more book levels. We know that children's 
interests often allow them to read materials otherwise thought to 
be too difficult for them. The data were examined to see how 
many children had discrepancy scoires on their library book selec- 
tions of one year or less, that .is, chose a library book within one 
year of their instructional reading leveL For this analysis, the chil- 
dren were placed into groups of high, middle, or low reading abil- 
ity on the basis of their instructional reading levels in relation to 
their grade levels. The study took place in April, Therefore, in 
grade three, children whose instructional reading levels were 
greater than 3.5 were considered high in reading ability, those 
between 3.0 and 3.5 were classified as middle reading ability, and 
those less than 3.0 were placed in the low reading ability group. In 
grade five, the corresponding high, middle, and low reading ability 
groups were those whose instructional reading levels were greater 
than 5.5, between 5.0 and 5.5, and less than 5.0. 

TABLE 2 

Number and Percentage of Subjects Selecting Library 
Books with Difficulty Levels Within One Year 
of their Instructional -Reading Levels 



Reading 








Ability- 


Grade 3 


Grade 5 


All Subjects 





No. 


% 


No. 


% 


No. 


% 


High 


10/163 


63 


7/9 


' 78 


17/25 


68 


Middle 


4/5 


80* 


. ' 9/15 * 


60 


13/20 


65 


Low 


2/8 


25 


. 5/7 


71 


7/15 


47 


Total 


16/29 


55 


21/31 


68 


37/60 


62 



3Read as 10 out of 16 



It can be seen from Table 2 that 62 percent of all the subjects 
selected library books with difficulty levels within one year of 



92 



Self-Selection of Materials 



their established instruction?^! reading levels. The percentage of 
children attaining the criterion of plus or minus one year increases 
as reading ability increases and is somewhat higher for the older 
children. It is important to note that the children in grade three 
whose instructional reading levels' are below grade level did least 
well in selecting appropriate literary books. Only two out of eight 
selected material within one year of their established reading 
levels. Of the 37 children who selected library books within one 
year of their instructional levels, 19 were female, 18 were male, 2^! 
were in grade five, and 16 were in grade three. 

Eighteen children obtained a discrepancy score of zero for 
their library book selections. This was nearly one-third of the 
subjects. Seven were male, eleven were female, len were fifth 
graders, eight were in third grade. Of the eight in grade three, six 
were in the high reading ability group with one each in the middle 
and low groups. Of the ten subjects in grade five, five were in the 
high group, three were in the middle group, and two were in the 
low group. These figu>res suggest that the higher the reading 
ability, the more likely a child will be to make appropriate book 
selections in terms of difficulty levels. 

CONCLUSIONS 

The following conclusions appear warranted by the results of 
this study. . 

1. A five-minute period of individual guidance as defined in 
this study will not influence a child to select reading materials 
more appropriate to his instructional reading level. It is possible, 
however, that used over a period of weeks or months, this 
approach might be successful. 

2. Many children are able to select readinn materials that are 
exactly the same as their instructional reading levels as determined 
by informal reading inventories. Nearly one-third of the subjects 
obtained a discrepancy score of zero for their library book selec- 
tions. 

3. If it is accepted that materials appropriate for a given child 
range from one year below to one year above his instructional 
reading level, then the majority of children in grades three and five 
can choose appropriate books but a substantial minority will need 
guidance in making selections. More than 60 percent of the chil- 
dren selected materials appropriate to their reading levels, but 



MORK 



93 



nearly 40 percent did not. 

4. On the whole, older children and better readers appear 
somewhat more able to select reading materials of appropriate 
difficulty. 

LIMfTATIONS OF THE STUDY AND 
RECOMMENDATICMS FOR FURTHER RESEARCH 

The results of t'l-s study should be interpreted in light of 
certain limitations. S ^me of these limitations and some of the 
positive findings suggebi profitable further research. 

Use of an informal reading inventory could have allowed lor 
error in determining the base level from which to make compari- 
sons. Use of a standardized reading inventory such as the Spache 
Diagnostic Reading Scales (15) or the Standard Reading Inventory 
(9) might have provided increased accuracy. 

The role mterest played in the selection of library books was 
not controlled. It is generally accepted that the child's interests 
and experiential background may cause him to choose a book that 
is sornewhat abovo his general instructional reading level but that 
his interest and knowledge of specialized vocabulary may allow 
him to read profitably. Evaluation of how well the child could 
read the specific material he selected would carry the investigation 
a step farther and provide iome information on the effect of inter- 
est and specific subject matter on the child's instructional level. 
For this evaluation, the standard criteria for assessing difficulty of 
materials through oral readjng (2) could be employed, or the cloze 
procedure (3) could be used. In a future study, also, having each 
child nnake more than one library book selection would provide 
additional confidence in results. 

According to the building principals, the subjects of this study 
had never been exposed to an individualized reading program. The 
majority of their reading instruction nad been from basal readers. 
Even though most basal reading programs do make recommenda- 
tions about the use of library books, including some suggestions 
on self-selection, little emphasis had been placed on helping these 
children to select appropriate reading materials. It would be partic- 
ularly interesting to repeat this study with a group of children 
who, for several months, had been involved in a reading program 
based on self-selection. A brief -longitudinal study dould clarify the 
effects of continued guidance and practice on the ability of chil- 
dren to select materials of appropriate difficulty. 



94 



SelfSelection of Materials 



References 

1. Barbe, Walter B. Educator's Guide to Personalized Reading Instruction. 

Englewood Cliffs, New Jer?;ey: Prentice-HaM, 1967. 

2. Belts, Emmett. Foundai.ons of Rsr^ding Instruction. New York: Anieri- 

can Book, 1946, 445-450. 

3. Bormuth, John R. "Close Test Readability: Criterion Reference Scores," 

Journal of Educational Measurement 5 (Fail 1968). 189-19S. 

4. Evans, N. Dean. "Individualized Reading-Myths and f sets," Elementary 

English, 39 (October 1962). 580 583. 
' 5. Fry, Edward B. "A Readability Formula That Saves Time," Journal of 
Reading, 11 (April 1968). 513-516. 

5. Hunt. Lyman C. "The Individualized Reading Program: A Perspective," 

in L. C. Hunt (Ed.). The I ndi^/iduafized Reading Program: A Guide for 
Classroom Teaching, 1966 Proceedings, Volume 11, Part 3. Newark, 
Delaware: International Reading Association, 1967, 1-6. 

7. Jacobs, Leiand. "Individualized Reading is Not a Thing." in Alice Miel 

(Ed.), Individualizing Reading Practices, New York. Teachers College 
Press, 1958, M7. 

8. Lazar, May, Marcelia Draper, and Louise Schwietert. A Practical Guide to 

Individualized Reading. New York: Board of Education of the 'Jity of 
New York, 1960. 

9. McCracken. Robert A. The Standard Reading fnwentory. Klamath Falls. 

Oregon: Klamath Printing, 1966. 

70. McCracken, Robert A. "Basic Principle^ of Reading Instruction in the 
Seventh Graae," High Trails: Teacher's Edition, Sheldon Basic Read- 
ing Series. Boston: Allyn and Bacon, 1968. 

7 7. M iel . Ai ice ( E d . ) . Individualizing Reading Prac tices. N e w Y or k : Teachers 
College Press, Columbia University. 1958. 

72. 0\son,Wi\\ard C. Child Development Boston: Heath. 1949. 

73. Olson, Willard C. "Seeking, Self-selection, and Pacing in the Use of Books 

by Children." Packet Boston: Heath, 1952, 3-10, 

74. Sheldon. William D. et al. Sheldon Basic Reading Series. Boston: Allyn 

and Bacon, 1965. 

75. Soache, George. Spache Diagnostic Reading Scales, f^onterey, CaWio^ r. 

California Test Bureau. 1964. 

16. Ttnney, Ronald. "A Study Comparing Instructional Reading Levels With 
Difficulty Levels of Materials Children are Expected to Read," unpub- 
lished paper. University of Victoria. 1970. 

77. \/eatchJeannene. Individualizing Your Reading Program. New York: G. 
P. Putnam's Sons, 1959. 

J 8. Veatch, Jeannette, and Phillip Acinapuro. Reading in the Elementary 
School. New York: Ronald Press. 1966. 

79. Vite, Irene. "A Primary Teacher's Experience," in Alice Miel (E6.) .Indi- 
vidualizing Reading Practices, New York: Teachers College Press, 
1958, 18^43. 

20, West. Roland. Individualized Reading Instruction. Port Washington, New 
York: Kennikat Press, 1964. 



MORK 



95 



Martin H. Jason 
and 

Beatrice Dubnow 



THE RELATIONSHIP BETWEEN 
SELF-PERCEPTIONS OF 
READING ABILITIES AND 
READING ACHIEVEMENT 



Roosevelt University 



"The self-concept, as operationally defined in various studies, has 
received the primary focus of research efforts in the area of self- 
perception and reading achievement. Treated as a global variable 
reflecting a pupil's generalized view of himself, the self-concept 
has been reported as being positively related to reading achieve- 
ment in the majority of investigations. Studies in which the rela- 
tionship has been supported at various grade levels include Bodwin 
(2), Lumpkin {8), Lamy (7), Wattenberg and Clifford [10), and 
Williams and Cole (12). 

A review of the literature has not disclosed any studies em- 
ploying self-report scales which involve specific reading abilities. 
One projective instrument (Reading Apperception Test) that dealt 
specifically with reading was developed, by Hake (5), The test, 
designed to evaluate covert motivations of good and poor readers, 
contains ten ambiguously drawn pictures depicting children in 
various reading situations. When the instrument was administered 
to a sample of 80 sixth grade pupils, the results revealed, among 
other findings, that below average readers had significantly lower 
self-concepts than above average readers. 

Since no self-report scale involving reading appeared to be 
available, it was the basic intent of the present investigation to 
develop such an instrument and begin initial testing in order to 
make judgments concerning its potential usefulness. 

The theoretical base underlying the present investigation is 
derived primarily from the phenomenologicsl principle that the 



96 



Self-Perception and Achievement 



phenomenal self, as the organization of all perceptions an individ- 
ual has about himself in a particular situation, governs his behavior 
in that situation (j). What is relevant in terms of understanding 
inadequate reading performance is that v-zhile a pupil's difficulties 
may be a function of perceptions commensurate with that per- 
formance, these perceptions, notwithstanding, satisfy a basic need. 
What would then on its surface appear as self-defeating pupil be- 
havior is quite the opposite when considered from the phenom- 
enological perspective. Briefly stated, this dimension of the theory 
as offered by Combs and Snygg holds that since the maintenance 
and enhancement of the phenomenal self is a fundamental human 
need, perceptions which are consistent with that self are selected 
whether they appear complimentary or self-damaging to an out- 
sider. Perceptions which are inconsistent are unlikely to occur as 
they would not fit the self structure. As applied to the reading 
situation, 

Most of the cases coming to the reading clinic are poor readers who 
have nothing whatever wrong with their eyes. They are not unable to 
read in a physical sense, but are children who for one reason or 
another have come to believe they cannot read. What is more, be- 
cause they see themselves as nonreaders, they approach reading ex- 
pecting to do badly, and a fine vicious circle gets established , . , 

This cyclical effect is also indicated by Quandt (9) who states that 
"Children . . . who come to school believing that they wjlL.aot 
succeed in reading, as well as children who gain this concept at a 
later time, may become victims of a self-fulfilling prophecy. Be- 
lieving that they will not succeed in reading their behaviors and 
efforts during reading instruction contribute to making their ex- 
pectations come true." In this same regard, "A child who, for 
whatever reason, develops negative self-perceptions may see him- 
self as an inadequate reader, incapable of learning, or just generally 
inadequate" (;). More positively, "If the child is highly profici ent 
in extracting ideas from the printed page and he recognizes this, , he 
will have a positive approach to reading. He is able to read, there- 
fore his concept of himself is as a 'reader' " (5). 

The application of phenomenological theory is reflected in the 
assumption that the perceptions a pupil holds regarding his reading 
abilities serve to either facilitate or inhibit his reading perform- 
ance. The following research hypotheses were formulated to test 



JASON and DUBNOW 



97 



this assumption: There is a positive relationship between self-per- 
ceptions of reading abilities and achievement in 1) vocabulary and 
2) reading comprehension. The same prediction was made sepa- 
rately for boys and girls. 

METHOD - 

The Self-Report Reading Scale, a20-item instrument requiring 
"Yes" or "No" responses, was designed for group;administration. 
Its purpose was to measure elementary school pupils' perceptions 
of their reading abilities. Representative items include: 

Most of the time I feel I wlil never be a good reader in school. 

I feel that there are too many hard words for me to learn in the 
stories I read. 

I can read as fast as the good readers. 

Most of the time when t see a new word I can sound it out by 
myself. 

The pupil was given one point for each item. to which he gave 
an answer representing a positive self-perception. In order to help 
insure* that a pupil's perceptions would not be inaccurately re- 
ported because of difficulty with vocabulary, words above a third 
grade reading favel were not included. The split-half reliability of 
the Self-Report Reading Scale corrected for test length was 0.88 
for the group of fifth graders participating in the study. Other 
instruments imployed in the study included the Otis-Lennqn 
.Ment^il Ability Test, Elementary II Level, Form J, and the Iowa 
Tests of Basic Skills, Vocabulary and Reading Comprehension 
tests, Form 3. ' ' 

All nine fifth grade classes from a Chicago suburban school 
district participated in the study. The pupils were grouped accord- 
ing to a multi-age plan. Of the 247 pupils in these classes, 2o;1 
were present for all of the testing and only their scores were 
analyzed. 

Arrangements were made to have all testing done with only 
fifth graders present. Teachers in each of the nine classes adminis- 
tered the achievement and IQ tests during the week prior to the 
administration of the Self-Report Reading Scale. One of the in- 
vestigators, administered this instrument, which took approxi- 
mately 1 5 minutes to complete. 



98 



Self-Perception and Achievement 



RESULTS 

The correlations between scores on the Self-Report Reading 
Scale and on the reading achievement tests are shown in Table 1. 
Both zero-order correlations and partial correlations with IQ par- 
tialed out are given. 

TABLE 1 

Means, SDs, and Interco'^elations 
of Self Report Reading Scale 
and Reading Achleveinnent 



Test Mean SD Correlation with Self Report 

IQ Not IQ 
Controlled Controlled 









Boys (N 


= 114) 






Self Report 


12.52 


4.12 


r 


P 


r 


P 


Vocabulary 


21.11 


7.61 


.36 


.001 


.19 


.02 


Comprehension 


32.59 


13.20 


.34 


.001 


.15 


.06 








Girls (N 


= 117) 






Self Report 


13.27 


5.17 


r 


. P 


r 


P 


Vocabulary 


22.73 


6.94 


.58 


.001 


.37 


.001 


Comprehension 


37,41 


12.43 


.52 


.00] 


.28 


.002 








Both Boys and Gi 


r^ (N = 


231) 


Self Report 


12.90 


4.68 


r 


P 


r 


P 


Vocabulary 


21.93 


7.30 


.48 


.001 


.28 


.001 


Comprehension 


35.03 


13.01 


,44 


.001 


,22 


.001 



An examination of Table I reveals significant, although not high, 
relationships between self-perceptions of reading abilities and 
"vocabulary" and "comprehension." When IQ was par.tialled out, 
the relationships were still significant at the .02 level or less except 
where "boys' vocabulary" revealed a .06 level of probability. An 
analysis of correlations of the Self-Peport Reading Scale with 
"vocabulary" indicated that they were not significantly different 
from corresponding correlations with "connprehension." 

CONCLUSIONS 

The results to a certain degree support the hypotheses which 
predicted that there is a positive relationship between self-report 
measures concerning reading abilities and reading achievement. 



JASON and DUBNOW 



99 



Although the lower coefficients obtained after (Q was partialled 
out permit only tentative conclusions, the findings do indicate a 
consistent trend in the predicted direction. 

The fact that coefficients were higher in all analyses involving 
girls rn;3y be related to the overall superiority of girls in reading 
achievement which is evident beyond first grade and continues 
throughout the elementary grades 14, 7 7). 

In lerms of further research, experimental F'fortsmay reveal 
the extent to which negative perceptions could be changed. An 
analysis of gains made from pretest to posttest in self-perceptions 
of reading abilities and achievement would yield additional data 
on the relationship between these variables. Concomitant improve- 
ment in both areas could indicate information on the role of self- 
perceptions as an intervening variable, i.e., one that would have 
the pivotal effect of influencing achievement positively. 

In conclusion it is felt that the Self-Report Reading Scale 
could be useful in sensitizing teachers to the importance of self- 
perceptions in the reading process. By becoming apprised of per- 
ceptions pupils hold, teachers could utilize the information from 
this instrument in remedial or individualized programs. In this 
regard an examination of responses for individual items might 
provide further direction for the diagnostic process. Through the 
child's identification of certain areas of concern to him, tests 
which diagnose specific areas of deficiency in greater depth can 
next be employed. The instrument may thus serve its best purpose 
if it facilitates the communication of poor reader's feelings of 
inadequacy to his teacher. 

References 

t, Beretta, Shirley, "Self-Concept Development in The Reading Program/' 
Reading Teacher. 23 (December 1970), 232-239. 

2. Bodwin, Raymond F. "The Relationship Between immature Self- 

Concept and Certain Educational Oi?abi(ities," unpublished doctoral 
dissertation, Michigan State University, 1P57. 

3. Combs, Arthur W., and Donald Snygg. individual Behavior. New York: 

Harper & Brothers, 1959. 

4. Gates, Arthur I. "Sex Differences in Reading Ability,-" Elementary 

School Journal, 61 (May 1961). 431-434. 

5. Hake, James M. "Covert Motivations of Good and Poor Readers/' Read- 

ing Teacher, 22 (May 1969), 731-738. 

6. Homze. Alma Cross. "Reading and the Self-Concept." Elementary Eng- 

lish, 39 (March 1962), 210-215. 



100 



Self-Perception and Achievement 



7. Larny, Mary W. "Relationship of Seff-Perceptions of Earlv Primary Chil- 

dren 10 Achievement in Reading/' unpublished doctordl dissertation, 
University of Florida, 1962. 

8. Lumpkin, Donavon D. "Relationship of Self-Concept to Achievement in 

Reading," unpublished doctoral dissertation, Univeisity of Southern 
California, 1959. 

9. Quandt, Ivan. Self-Concept and Reading. Newark, Delaware: Interna- 

tional Reading Association, 1972,9. 

10. Wattenberg, William W., and Clare Clifford. "Relation of Self-Concepts 

to Beginning Achievement in Reading," Child Development, 35 (June 
1964), 46[-467. 

11. Weintraub, Samuel. "What Research Says to the Reading Teacher," 

Reading Teacher, 19 (November 1966), 155-165. 

12. Williams, Robert L, and Spurgeon Cole. "Self-Concept and School 

Adjustment," Personnel and Guidance Journal, 46 (January 1968). 
- 478-481. 



Q JASON and DUBNOW 

ERIC 



